Memory remanence attack analysis

You have probably heard by now of the memory remanence attack by Halderman et al. They show that it is easy to recover cryptographic keys from RAM after a reset or moving the DIMM to another system. This is important to any software that stores keys in RAM, and they targeted disk encryption. It’s a nice paper with a very polished but responsible publicity campaign, including a video.

Like most good papers, some parts of the attack were known for a long time and others were creative improvements. Memory remanence has been a known issue ever since the first key had to be zeroed after use. In the PC environment, the trusted computing efforts have been aware of this as well. (See “Hardware Attacks”, chapter 13 — S3 is suspend-to-ram and SCLEAN is a module that must be run during power-on to clear RAM). However, the Halderman team is publishing the first concrete results in this area and it should shake things up.

One outcome I do not want to see from this is a blind movement to closed hardware crypto (e.g., hard disk drives with onboard encryption). Such systems are ok in principle, but in practice often compromise security in more obvious ways than a warm reboot. For example, a hard drive that stores encryption keys in a special “lock sector” that the drive firmware won’t access without a valid password can be easily circumvented by patching the firmware. Such a system would be less secure in a cold power-on scenario than well-implemented software. The solution here is to ask vendors for documentation on their security implementation before making a purchase or only buy hardware that has been reviewed by a third-party with a report that matches your expectations. (Full disclosure: I perform this kind of review at Root Labs.)

Another observation is that this attack underlines the need to apply software protection techniques to other security applications besides DRM. If an attacker can dump your RAM, you need effective ways to hide the key in memory like white-box crypto, obfuscate and tamper-protect software that uses it, and randomize each install to prevent “break once, run everywhere” attacks. Yes, this is the exact same threat model DRM has faced for years but this time you care because you’re the target.

It will be interesting to see how vendors respond to this. Zeroing memory on reboot is an obvious change that addresses some of their methods. A more subtle hack is to set up page mapping and cache configuration such that the key is loaded into a cache line and never evicted (as done for fast IP routing table lookup in this awesome paper). However, none of this stops attacks that move the DIMM to another system. On standard x86 hardware, there’s no place other than RAM to put keys. However, the VIA C7 processors have hardware AES built into the CPU, and it’s possible more vendors will take this approach to providing secure key storage and crypto acceleration.

Whatever the changes, it will probably take a long time before this attack is effectively addressed. Set your encrypted volumes to auto-detach during suspend or a reasonable timeout and keep an eye on your laptop.

C64 25th anniversary event

Next Monday, December 10th, I will be at the Computer History museum to hear a panel discussing the 25th anniversary of the C64. It includes Jack Tramiel, founder and CEO of Commodore, Adam Chowaniec (manager of the Amiga), and some other guy.

There’s a lot that’s been written about retrocomputing, most recently this CNN article. I myself started with a VIC-20 and a 300 baud modem around 1983. I still have a few pages of old homework where I wrote an assembly joystick decoding routine in the margin. I later got a C64c in 1986. My Commodore era ended when I upgrade to a 486DX-33 in 1991. The 486 was my desktop for years, running DOS, Linux, and finally FreeBSD. It then served up root.org until I replaced it in 1999.

The most fascinating things about the C64 were games, demos, and copy protection. Games and demos made me ask “how do they do that?” It was easy to run a disassembler and see surprising techniques like self-modifying code and tricky raster interrupt timing. Copy protection was also a big eye-opener since it seemed to violate the principle that if bits can be read, they can also be written. (Of course, this principle is still generally true, but the skill of the protection author can greatly affect the difficulty.)

I don’t like to admit defeat, and there were some copy protection schemes I was never able to figure out. Now with the power of emulators and ways to physically connect a floppy drive to my PC, I can dust off those old disks and figure out how they worked. Most crackers didn’t need to understand the media layout or protection scheme in detail since they could often “freeze” and capture the game code from memory and then piece together a loader for it. In the race to get the first release of the latest game out, a lot of interesting details about how the protection worked would be overlooked. I think the protection code is as interesting as the game.

There is something refreshing about using a computer where every signal is 5 volts, instructions are a single byte, the clock is 1 microsecond, and ROM gives you reset times of a couple seconds. You just can’t make a mistake and lose all the time spent reinstalling software as you can with today’s hard drive-based systems. Hopefully, the advent of virtualization and good network backup software is going to return us to some of that carefree attitude.

As a hobby, I continue to help with the C64 Preservation Project. My next planned project is creating a USB interface to the parallel cable so that I can use nibtools with my computers that no longer have a printer port. Also, I find that loading an image of a protected floppy into an emulator on my laptop and disassembling it makes for a nice travel diversion during the holidays.

I hope you will enjoy the holidays in your own way and have a great 2008!

[Edit: the official video of the event has now been posted here and here]

Trapping access to debug registers

If you’re designing or attacking a software protection scheme, the debug registers are a great resource. Their use is mostly described in the Intel SDM Volume 3B, chapter 18. They can only be accessed by ring 0 software, but their breakpoints can be triggered by execution of unprivileged code.

The debug registers provide hardware support for setting up to four different breakpoints. They have been around since the 386, as this fascinating history describes. Each breakpoint can set to occur on an execute, write, read/write, or IO read/write (i.e., in/out instructions). Each monitored address can be a range of 1, 2, 4, or 8 bytes.

DR0-3 store the addresses to be monitored. DR6 provides status bits that describe which event occurred. DR7 configures the type of event to monitor for each address. DR4-5 are aliases for DR6-7 if the CR4.DE bit is clear. Otherwise, accessing these registers yields an undocumented opcode exception. This behavior might be useful for obfuscation.

When a condition is met for one of the four breakpoints, INT1 is triggered. This is the same exception as for a single-step trap (EFLAGS.TF = 1). INT3 is for software breakpoints and is useful when setting more than four breakpoints. However, software breakpoints require modifying the code to insert an int3 instruction and can’t monitor reads/writes to memory.

One very useful feature of the debug registers is DR7.GD (bit 13). Setting this bit causes reads or writes to any of the debug registers to generate an INT1. This was originally intended to support ICE (In-Circuit Emulation) since some x86 processors implemented test mode by executing normal instructions. This mode was the same as SMM (System Management Mode), the feature that makes your laptop power management work. SMM has been around since the 386SL and is the original x86 hypervisor.

To analyze a protection scheme that accesses the debug registers, hook INT1 and set DR7.GD. When your handler is called, check DR6.BD (also bit 13). If it is set, the instruction at the faulting EIP was about to read or write to a debug register. You’re probably somewhere near the protection code.  Since this is a faulting exception, the MOV DRx instruction has not executed yet and can be skipped by updating the EIP on the stack before executing IRET.

If you’re designing software protection, there are some interesting ways to use this feature to prevent attackers from having easy access to the debug registers. I’ll have to leave that for another day.

C64 screen memory and anti-debugging

I think it’s fun to stir your creativity periodically by analyzing old software protection schemes. I prefer the C64 because emulators are widely available, disks are cheap and easy to import, and it’s the system I became most familiar with as a kid.

One interesting anti-debugging trick was to load the protection code into screen memory. Just like on the PC, the data on your screen is just a series of values stored in memory accessible to the main CPU. On the C64, screen memory typically was located at 0x400 – 0x7FF. Data could be loaded into this region by setting the block addresses in the file’s directory entry (very simple version of shared library load address) or by explicitly storing it at that address using the serial load routines.

To keep users from seeing garbage, the foreground and background colors were set to be the same. If you tried to break into a debugger, the prompt would usually overwrite the protection code. This could be worked around by relocating actual screen memory (by reprogramming the VIC-II chip) or by manually loading the code at a different address and disassembling it.

This is an example of anti-debugging based on utilization of shared resources. The logic is that a debugger needs to use the screen to run, so if the protection is using that resource also, the attacker will disrupt the system by activating the debugger. It is usually much more effective to use up a shared resource than to just check for signs that a debugger is present, an approach that is still important today.

DRM is passive and active

In a post regarding DRM (based on another post), Alun Jones of Microsoft says:

“Passive DRM protects its content from onlookers who do not have a DRM-enabled client. Encryption is generally used for Passive DRM, so that the content is meaningless garbage unless you have the right bits in your client. I consider this ‘passive’ protection, because the data is inaccessible by default, and only becomes accessible if you have the right kind of client, with the right key.

Active DRM, then, would be a scheme where protection is only provided if the client in use is one that is correctly coded to block access where it has not been specifically granted. This is a scheme in which the data is readily accessible to most normal viewers / players, but has a special code that tells a DRM-enabled viewer/player to hide the content from people who haven’t been approved.”

The whole problem is his two categories are a false distinction. You can’t arbitrarily draw a line through a system and say “this is passive, this is active.” For your CSS example, if you consider a given player’s decryption code along with an arbitrary encrypted DVD, you have a system with both active and passive elements. If you leave out either of those elements, you have a disc that won’t play or a player with no disc, the only perfectly secure system (assuming your cryptography is good.)

When judging the efficiency of new compression schemes, the size of the decoder is added to the size of the compressed data to get a fair assessment of its efficiency. Otherwise you could win contests with a one-byte file and a 10 GB decoder program that simply contains all the actual data.

Whichever way you design a system, complexity is being pushed from one party to another but never eliminated. For DVD, where most of the complexity is in the player, there is a huge variety of player implementations that each have their own bugs. The author of every disc needs to test against many combinations of players because of that problem.

Likewise, if you push the complexity onto the disc by including executable code there, the player gets simpler but the disc could be buggy. However, in that case, the content author will get a bad reputation for the buggy disc (see the Sony rootkit fiasco he mentions).

This doesn’t just apply to DRM. While he might consider a MPEG4-AVC video file as “passive” in his terminology, it is really a complex series of instructions to the decoder. Look at the number of different but valid ways to encode video and you’ll see it’s closer to a program than to “passive” data.

Now in his definition for “Active DRM”, he is not actually describing the general class of software protection techniques. He is describing a system that is poorly-designed, often due to an attempt to retrofit DRM onto an existing system without it. Of course it makes sense that if you have two ways to access the content, one with DRM and the other without, the additional complexity makes no sense to the end-user or mass copiers. It may make economic sense to the content author, but they have to weigh the potential risks to their business also (annoying users vs. stopping some casual copying.)

Even assuming his terminology makes sense, the Windows Media Center system he references is actually a combination of “active” and “passive”. The cable video stream is encrypted (“passive”), and the Windows DRM component is “active”. In particular, it has a “black box” DLL that checks the host environment and hashes various items to derive a key, hence the problem.

All I can distill from what Alun says is “an unprotected system is made more complex by adding DRM.” I agree, but this doesn’t say anything larger about “active” versus “passive” DRM.

Full disclosure: I was previously one of the designers of the Blu-ray protection layer (BD+), a unique approach to disc protection that involves both cryptography and software protection. You can consider me biased, but my analysis should be able to stand on its own.

Mesh design pattern: error correction

Our previous mesh design pattern, hash-and-decrypt, requires the attacker either to run the system to completion or reverse-engineer enough of it to limit the search space. If any bit of the input to the hash function is incorrect, the decryption key is completely wrong. This could be used, for example, in a game to unlock a subsequent level after the user has passed a number of objectives on the previous level. It could also be used with software protection to be sure a set of integrity checks or anti-debugger countermeasures have been running continuously.

Another pattern that is somewhat rare is error correction. An error correcting code uses compressed redundancy to allow data that has been modified to be repaired. It is commonly used in network protocols or hard disks to handle unintentional errors but can also be useful for software protection. In this case, an attacker who patches the software or modifies its data would find that the changes have no effect as they are silently repaired. This can be combined with other techniques (e.g., anti-debugging) to require an attacker to locate all points in the mesh and disable them simultaneously. Turbo codes, LDPC, and Reed-Solomon are some commonly used algorithms.

Hashing and error correction are very similar. A cryptographic hash is analogous to a compressed form of the original data, since by design it is extremely difficult to generate a collision (two sets of data that have the same fingerprint.) Instead of comparing every byte of two data sets, many systems just compare the hash. You can build a crude form of error correction by storing multiple copies of the original data and throwing out any that have an incorrect hash due to a patching attempt or other error. However, this results in bloat, and it’s relatively easy for the reverse engineer to find all copies of the identical data in memory, even if the hash is somewhat hidden.

Turbo codes are an efficient form of error correction. To put it simply, three different chunks of data are stored: the message itself (m bits) and two parity blocks (n/2 bits each). The total storage required is m + n bits, coding data at a rate of m / (m + n). You can think of this as a sort of crossword puzzle where one parity block stores the clues for “across” and the other stores the clues for “down”. Two decoders process the parity blocks and vote on their confidence in the output bits. If the vote is inconclusive, the process iterates.

turbocode.png

To use error correction for software protection, take a block of data or instructions that are important to security. Generate an encoded block for it using a turbo code. Now, insert a decoder in the code which calls into or processes this block of data. If an attacker patches the encoded data (say, to insert a breakpoint), the decoder will generate a repaired version of that data before using it.

This has a number of advantages. If the decoding is not done in-place, the attacker will not see the data being repaired, just that the patch had no effect. The parity blocks look nothing like the original data itself so it looks like there is only one copy of the data in memory. The decoder can be obfuscated in various ways and inlined to prevent it from being a single point of failure. The calling code can hash the state of the decoder as part of hash-and-decrypt so that errors are detected as well, allowing the software protection to later degrade the experience rather than immediately failing. This hides the location of the protection check (temporal distance.)

Like all mesh techniques, error correction is best used in ways that are mutually reinforcing. The linker can be adapted to automatically encode data and insert the decoding logic throughout the program, based on control flow analysis. Continually-running integrity checking routines can be encoded with this approach. The more intertwined the software protection, the harder it is to bypass.