Ed Felten posted last week about his upcoming publication on fingerprinting techniques for blank paper. This is a great paper with practical applications. It reminded me to discuss some of the fundamental issues with fingerprinting and the potential problems when it comes to real-world forgery.
Long ago, some copy protections for floppy disks used a method known as “weak bits”. The mastering equipment would write a long string of zeros to the disk. This would cause the read head to return a varying string of bits with some pattern to it. The software would check that this region returned different values each time it was read to make sure it wasn’t a copy.
Similar techniques have also been applied to magstripe media for credit cards. Magtek makes a reader that attempts to measure physical characteristics of the magnetic stripe and submit this as a fingerprint for a given card. The general idea is that while the data on a card can be easily duplicated with a reader, the manufacturing process for the physical media leaves behind certain random “noise” that is difficult to reproduce.
This is similar to the Felten paper. They attempt to create a unique fingerprint for a piece of paper based on the variations in fibers that are visible with a high-resolution scan. They take multiple scans from different angles as well.
All of these techniques have something in common. The characteristic being measured must actually have some uniqueness. There must be a cost-effective way to measure that characteristic. There must be a sampling mechanism that chooses different areas to examine. The fingerprint algorithm must combine the samples in a way that is resilient to natural errors (i.e., no false positives). Yet it also must be difficult for a forger to create a copy that is close enough to the original to be accepted by the verifier (i.e., no false negatives).
Both magstripes and paper appear to have enough inherent uniqueness. The manufacturing techniques of both do create a lot of low-level variation. But once this requirement is satisfied, the fingerprint approach itself is still subject to fundamental limitations. No fingerprinting method can avoid them. It needs to be resilient not only in the face of regular use (e.g., crumpling the paper) but also with intentionally malicious manipulation. The conflicting requirements to avoid false positives and yet also be difficult to clone are always the most difficult part of any kind of fingerprinting scheme. This is a fundamental problem with any kind of statistical decision process.
There are two kinds of forgery attacks: second pre-image and collision. The former is the most obvious one, where an attacker creates a copy that matches some existing original. The latter is much harder to prevent. To create a collision, that attacker can pre-process two pieces of paper in order to create two documents that the fingerprint algorithm judges as close enough to be identical. For example, the attacker can write a sequence of small dots to both pages in a similar pattern before printing the text. He can repeat this multiple times while varying the pattern until the verifier judges the papers as close enough. Depending on the sampling algorithm and the attacker’s printing capabilities, this may be more difficult. Section 6 of the paper discusses this kind of attack but it mostly focuses on preventing a second pre-image attack and most of the analysis is left for the future.
The key thing to remember is that the attacker does not need to make the papers actually identical by reproducing the exact pattern of fibers on the paper. The attacker doesn’t even have to have a particularly fine dot resolution, as long as the position of the dots can be controlled. The idea is that the printed pattern overwhelms the fine characteristics measured by the scanner and thus two documents are judged to be close enough by the verifier. It also would be interesting to see how the fingerprint technique does against darker colored paper.
This attack illustrates the fundamental limitation of this kind of fingerprint method. The verifier has to allow for some variation to prevent false positives. But an attacker can repeatedly try to exploit that rejection region by creating various pairs of documents until they pass.
All of this is based on a preliminary read of the paper, so I’m interested in what the Felten team plans to do to address this kind of problem.