Anti-debugging techniques of the past

During the C64 and Apple II years, a number of interesting protection schemes were developed that still have parallels in today’s systems. The C64/1541 disk drive schemes relied on cleverly exploiting hardware behavior since there was no security processor onboard to use as a root of trust. Nowadays, games or DRM systems for the PC still have the same limitation, while modern video game systems rely more on custom hardware capabilities.

Most targeted anti-debugger techniques rely on exploiting shared resources. For example, a single interrupt vector cannot be used by both the application and the debugger at the same time. Reusing that resource as part of the protection scheme and for normal application operations forces the attacker to modify some other shared resource (perhaps by hooking the function prologue) instead.

One interesting anti-debugger technique was to load the computer’s protection code into screen memory. This range of RAM, as with modern integrated video chipsets, is regular system memory that is also mapped to the display chip. Writing values to this region changes the display. Reading it returns the pixel data that is onscreen. Since it’s just RAM, the CPU can execute out of it as well but the user would see garbage on the screen. The protection code would change the foreground and background colors to be the same, making the data invisible while the code executed.

When an attacker broke into program execution with a machine language monitor (i.e., debugger), the command prompt displayed by the monitor would overwrite the protection code. If it was later resumed, the program would just crash because the critical protection code was no longer present in RAM. The video memory was a shared resource used by both the protection and the debugger.

Another technique was to load code into an area of RAM that was cleared during system reset. If an attacker reset the machine without powering it off, the data would ordinarily still be present in RAM. However, if it was within a region that was zeroed by the reset code in ROM, nothing would remain for the attacker to examine.

As protection schemes became stronger in the late 1980’s, users resorted to hardware-based attacks when software-only copying was no longer possible. Many protection schemes took advantage of the limited RAM in the 1541 drive (2 KB) by using custom bit encoding on the media and booting a custom loader/protection routine into the drive RAM to read it. The loader would lock out all access by the C64 to drive memory so it could not easily be dumped and analyzed.

The copy system authors responded in a number of ways. One of them was to provide a RAM expansion board (8 KB) that allowed the custom bit encoding to be read all in one chunk, circumventing the boundary problems that occurred when copying in 2 KB chunks and trying to stitch them back together. It also allowed the protection code in the drive to be copied up to higher memory, saving it from being zeroed when the drive was reset. This way the drive would once again allow memory access by the C64 and the loader could be dumped and analyzed.

Protection authors responded by crashing the drive when expanded RAM was present. With most hardware memory access, there’s a concept known as “mirroring.” If the address space is bigger than the physical RAM present, accesses to higher addresses wrap around within the actual memory. For example, accesses to address 0, 2 KB and 4 KB would map to the same RAM address (0) on a 1541 with stock memory. But on a drive with 8 KB expanded RAM, these would map to three different locations.

1541ram1.png

One technique was to scramble the latter part of the loader. The first part would descramble the remainder but use addresses above 2 KB to do its work. On a stock 1541, the memory accesses would wrap around, descrambling the proper locations in the loader. On a modified drive, they would just write garbage to upper memory and the loader would crash once it got to the still-scrambled code in the lower 2 KB.

1541ram2.png

These schemes were later circumvented by adding a switch to the RAM expansion that allowed it to be switched off when it wasn’t in use, but this did add to the annoyance factor of regularly using a modified drive.

Reverse engineering with a VM

In a previous comment, Tim Newsham mentions reverse engineering an application by running it in a VM. As it so happened, I gave a talk on building and breaking systems using VMs a couple years ago. One very nice approach is ReVirt, which records the state of a VM, allowing debugging to go forwards or backwards. That is, you can actually rewind past interrupts, IO, and other system events to examine the state of the software at any arbitrary point. Obviously, this would be great for reverse engineering though, as Tim points out, there haven’t been many public instances of people doing this. (If there have, can you please point them out to me?)

An idea I had a few years back was to design a VM-based system to assist in developing Linux or FreeBSD drivers when only Windows drivers are available. The VM would be patched to record data associated with all IO instructions (inb, outb, etc.), PCI config space access, and memory-mapped IO (a “wedge” device.) It would pass through the data for a single real hardware device. To the guest OS, it would appear to be a normal VM with one non-virtual device.

vmreveng.png

To reverse engineer a device, you would configure the VM with the bus:slot:function of the device to pass through. Boot Windows in the VM with the vendor driver installed. Use the device normally, marking the log at various points (“boot probe”, “associating with an AP”). Pass that log on to the open source developer to assist in implementing or improving a driver.

A similar approach without involving a VM would be to make a Windows service that loads early and hooks HAL.DLL as well as sets protection on any memory mappings of the target device. Similar to copy-on-write, access to that memory would trigger an exception that the service could handle, recording the data and permitting access. This could be distributed to end users to help in remote debugging of proprietary hardware.

Anti-debugger techniques are overrated

Most protection schemes include various anti-debugger techniques. They can be as simple as IsDebuggerPresent() or complex as attempting to detect or crash a particular version of SoftICE. The promise of these techniques is that they will prevent attackers from using their favorite tools. The reality is that they are either too simple and thus easy to bypass or too specific to a particular type or version of debugger. When designing software protection, it’s best to build a core that is resistant to reverse-engineering of all kinds and not rely on anti-debugger techniques.

One key point that is often overlooked is that anti-debugger techniques, at best, increase the difficulty of the first break. This characteristic is similar to other approaches, including obfuscation. Such techniques do nothing to prevent optimizing the first attack or packaging an attack for distribution.

In any protection system, there are two kinds of primitives: checks and landmines. Checks produce a changing value or code execution path based on the status of the item checked. Landmines crash, hang, scramble, or otherwise interfere with the attacker’s tools themselves. Anti-debugger techniques come in both flavors.

IsDebuggerPresent() is an example of a simple check. It is extremely general but can be easily bypassed with a breakpoint script or by traditional approaches like patching the import table (IAT). However, since the implementation of this function merely returns a value from the process memory, it can even be overwritten to always be 0 (False). The approach is to find the TIB (Thread Information Block) via the %fs segment register, dereference a pointer to the PEB (Process Environment Block), and overwrite a byte at offset 2. Since it is so general, it has little security value.

More targeted checks or landmines have been used before, against SoftICE for example. (Note that the SIDT method listed is very similar to the later Red Pill approach — everything old is new again.) To the protection author, targeted anti-debugger techniques are like using a 0day exploit. Once you are detected and the hole disabled or patched, you have to find a new one. That may be a reasonable risk if you are an attacker secretly trying to compromise a few valuable servers. But as a protection author, you’re publishing your technique to the entire world of reverse engineers, people especially adept at figuring it out. You may slow down the first one for a little while, but nothing more than that.

Anti-debugging techniques do have a small place if their limits are recognized. IsDebuggerPresent() can be used to provide a courtesy notice reminding the user of their license agreement (i.e., the “no reverse engineering” clause.) However, since it is so easily bypassed, it should not be used as part of any protection scheme. Debugger-specific checks and landmines can be sprinkled throughout the codebase and woven into the overall scheme via mesh techniques. However, their individual reliability should be considered very low due to constantly improving reversing tools and the ease with which attackers can adapt to any specific technique.

Mesh design pattern: hash-and-decrypt

Hash functions are an excellent way to tie together various parts of a protection mechanism. Our first mesh design pattern, hash-and-decrypt, uses a hash function to derive a key that is then used to decrypt the next stage. Since a cryptographic hash (e.g., SHA-1) is sensitive to a change of even a single bit of input, this pattern provides a strong way to insure the next stage (code, data, more checks) is not accessible unless all the input bits are correct.

Diagram of hashing and then decrypting

For example, consider a game with different levels, each encrypted with a different AES key. The key to decrypt level N+1 can be derived by hashing together data which only is present in RAM after the player has beat level N with an unmodified game (e.g., correct items in inventory, state of treasure chests, map of locations visited, etc.) If an attacker tries to cheat on level N by modifying the game state, they won’t know what items they need to have, may load up their character with items that are impossible to have at that point in the game, or one or more map positions won’t have been marked as visited. In this case, the hash and thus the next level key will be incorrect. Any difference in the hashed data produces an incorrect key and the level cannot be decrypted without the exact key.

In software protection, the focus is on verifying that security checks are intact and running properly. Hash-and-decrypt would cover code and data locations that might be modified by an attacker who is debugging or patching the application in order to reverse engineer it. This includes locations that might be changed by setting breakpoints (i.e., int 3 or Detours-style function hooking, debug registers DR0-3, IDTR a la Red Pill) or self-check functions that may be disabled or paused while analyzing the executable. The encrypted stage N+1 can be parts of the application as well as other self-check functions.

To tie together multiple self-check functions, hash-and-decrypt can also be layered. For example, each self-check function, which monitors one or more different hook points, can iteratively hash a unique seed each time it runs along with the data it observes. The unlock function hashes each self-check’s hash, decrypts, and then executes the protected code. If an attacker pauses one or more of the self-check threads, the hash will be incorrect when the unlock function runs.

multicheck.png

To slow down the attacker, invariants that are not security-critical can be included in the hash (“chaff”). For example, the game could hash parts of RAM that are known to be constants or map data. This keeps the hashed addresses from pointing out what’s important. This falls under the category of obfuscation techniques, but is a good example of how each software protection method mutually enforces each other.

Finally, this technique can be implemented in hardware to prevent glitching attacks. In a glitching attack, a pulse is used to disrupt a processor’s execution (usually via the clock line), causing some operation to be performed incorrectly. Attackers can use this to target internal logic and trigger malicious code execution in tamper-resistant devices. To counter this, the internal state (i.e. pipeline registers, address lines, caches, IR) of the processor can be hashed by custom hardware each clock cycle. At the end of each instruction, the final hash is used to unlock the next operation. If a glitch caused the value of any flip-flop in the CPU to be incorrect during any of the intermediate clock cycles, the next operation will not decrypt properly.

Chain, defense-in-depth, and mesh models again

Richard Bejtlich recently discussed my posts on the mesh design model. Thomas Ptacek added a post comparing the three terms I’ve described. Since there is still some confusion, let me try to summarize.

Chain: break one of N mechanisms, all equally compromise your system. Example: SSL load-balancers that terminate and then re-encrypt a session. Attack: extract the private key from the load balancer OR the backend server, whichever is easier.

Defense-in-depth: break all N mechanisms (often in order: 1, 2, 3, …) to compromise the system. Example: multiple separately-configure firewalls (i.e. DMZ). Attack: traverse firewall 1, then firewall 2, then access database server.

Mesh: break all N mechanisms at exactly the same time. Sequential, incremental compromise is not possible as it is with a purely defense-in-depth approach. Example: cryptographic key where each bit is independent of all the others. Attack: guess all bits correctly (i.e. brute force).

Each of these models has its strengths and weaknesses and should be used where appropriate. SSH public key authentication is a chain model. The server trusts your public key signature which was generated from a key that you decrypted using your passphrase. This is a useful chain since it’s much easier to remember a passphrase than a 256-byte random string. But it also means the server is equally compromised if you type in your passphrase from a public terminal, use a weak passphrase and the attacker does a dictionary attack on your private key file, or the private key is hijacked from RAM of a persistent SSH connection. That’s a series of risks that I understand and am willing to accept in order to avoid memorizing my DSA key.

Also, defense-in-depth can be used to provide multiple layers of mesh when each layer cannot be interlinked with the others to form a single mesh. For example, the mesh model can be applied to a set of self-checks that my network game client performs on itself to make sure it hasn’t been trojaned. And my server can implement extra interleaved protocol steps (another mesh) to verify that it’s talking to my self-checking game client, and not a generic one that has been swapped in by a cheater. The combination of applying the mesh model in both places is defense-in-depth since both can be attacked independently.

All this semantics isn’t useful without concrete examples. Next, I’ll be providing examples of applying the mesh model to solving common security problems.

Mesh approach versus defense-in-depth

A commenter suggested by email that the mesh concept in my previous post is very similar to defense-in-depth. While they are similar, there are some critical differences that are especially important when you apply them to software protection.

Defense-in-depth comes from military history where a defender would build a series of positions and then fall back each time the enemy advanced forward through the first positions.   This works in security as well.  For instance, a web server may be run in a restricted chroot environment so that if the web server is compromised, damage is limited to the files in the restricted directory, not the whole system.

The mesh model, on the other hand, involves a series of interlocking checks and enforcement mechanisms.  There is nothing to fall back to because all the defenses are active at the same time, mutually reinforcing each other.  This concept is less common than defense-in-depth for network security use due to the difficulty of incorporating it into system designs.  However, it is extremely common in cryptography.

A block cipher, say AES, is a good example of a mesh design.  Plaintext data is encrypted with a 128-bit key and output as ciphertext.  Any good block cipher is designed so that even a 1 bit change in the key or plaintext will result in a 50% chance of a bit flip for each ciphertext bit.  This is called the “avalanche effect.” What does this mean for an attacker trying to guess the key by brute force to decrypt the data?  If he is incorrect by even 1 bit, on average half the plaintext bits will be incorrect.  Another way to state this is that incrementally guessing the bits of an AES key is useless since guessing 127 of the bits correctly still gives you no information about the 128th bit.

An attacker facing a defense-in-depth challenge can break down each wall in order, picking appropriate attacks for each wall.  He only has to be smart enough at any one time to compromise the next wall (i.e., O(n) where n is the number of walls).  However, for a system designed with the mesh model, the attacker has to compromise all the walls at the same time to compromise the system (i.e., O(2^n)).  This is a significant difference in effort.

Of course, the best approach is to use the mesh model plus defense-in-depth.  For example, a protocol designer that encrypts data with AES and performs integrity checking with SHA-256 HMAC (instead of an AES CBC MAC) is using defense-in-depth since a flaw in either algorithm or construction is not likely to affect the other.