They can tell by the Pixels


Wissenschaftler haben statistische Modelle (Hidden Markov Models) für verpixelte und weichgezeichneten Schriftzeichen aufgestellt und mit denen scheinbar anonymisierten Text ausgelesen. Diese Form der Brute-Force-Methode ist nicht ganz neu, funktioniert aber eine ganze Liga besser (im Fall von Gaußscher Weichzeichnung bis zu einem Blur-Radius von 45 Pixeln (was sehr, sehr, sehr weichgezeichnet ist).

Paper: On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction, Fusion: Pixelating or blurring text doesn’t actually work

The UC-San Diego researchers found that they could use statistical models—”so-called hidden Markov models”—to generate the blurring or pixelation of lots of numbers, letters, and words, to the point that their software program could match a known redaction to an unknown redaction to figure out what it says. The biggest challenge is figuring out the font and size of the underlying text which the researchers need for their deciphering. They say it works better than a brute-force technique for deciphering pixelated images discussed by Dheera Venkatraman in 2007.

“We conclude that hidden Markov models allow near-perfect recovery of text redacted by mosaicing or blurring for many common fonts and parameter settings, and that mosaicing and blurring are not effective choices for textual document redaction,” the researchers write in their paper.