August 2007 – rdist

Our previous mesh design pattern, hash-and-decrypt, requires the attacker either to run the system to completion or reverse-engineer enough of it to limit the search space. If any bit of the input to the hash function is incorrect, the decryption key is completely wrong. This could be used, for example, in a game to unlock a subsequent level after the user has passed a number of objectives on the previous level. It could also be used with software protection to be sure a set of integrity checks or anti-debugger countermeasures have been running continuously.

Another pattern that is somewhat rare is error correction. An error correcting code uses compressed redundancy to allow data that has been modified to be repaired. It is commonly used in network protocols or hard disks to handle unintentional errors but can also be useful for software protection. In this case, an attacker who patches the software or modifies its data would find that the changes have no effect as they are silently repaired. This can be combined with other techniques (e.g., anti-debugging) to require an attacker to locate all points in the mesh and disable them simultaneously. Turbo codes, LDPC, and Reed-Solomon are some commonly used algorithms.

Hashing and error correction are very similar. A cryptographic hash is analogous to a compressed form of the original data, since by design it is extremely difficult to generate a collision (two sets of data that have the same fingerprint.) Instead of comparing every byte of two data sets, many systems just compare the hash. You can build a crude form of error correction by storing multiple copies of the original data and throwing out any that have an incorrect hash due to a patching attempt or other error. However, this results in bloat, and it’s relatively easy for the reverse engineer to find all copies of the identical data in memory, even if the hash is somewhat hidden.

Turbo codes are an efficient form of error correction. To put it simply, three different chunks of data are stored: the message itself (m bits) and two parity blocks (n/2 bits each). The total storage required is m + n bits, coding data at a rate of m / (m + n). You can think of this as a sort of crossword puzzle where one parity block stores the clues for “across” and the other stores the clues for “down”. Two decoders process the parity blocks and vote on their confidence in the output bits. If the vote is inconclusive, the process iterates.

To use error correction for software protection, take a block of data or instructions that are important to security. Generate an encoded block for it using a turbo code. Now, insert a decoder in the code which calls into or processes this block of data. If an attacker patches the encoded data (say, to insert a breakpoint), the decoder will generate a repaired version of that data before using it.

This has a number of advantages. If the decoding is not done in-place, the attacker will not see the data being repaired, just that the patch had no effect. The parity blocks look nothing like the original data itself so it looks like there is only one copy of the data in memory. The decoder can be obfuscated in various ways and inlined to prevent it from being a single point of failure. The calling code can hash the state of the decoder as part of hash-and-decrypt so that errors are detected as well, allowing the software protection to later degrade the experience rather than immediately failing. This hides the location of the protection check (temporal distance.)

Like all mesh techniques, error correction is best used in ways that are mutually reinforcing. The linker can be adapted to automatically encode data and insert the decoding logic throughout the program, based on control flow analysis. Continually-running integrity checking routines can be encoded with this approach. The more intertwined the software protection, the harder it is to bypass.

Month: August 2007

Mesh design pattern: error correction

Next Baysec: Aug 20 at O’Neills