Interview about DRM on Security Focus

June 5, 2008June 5, 2008 ~ Nate Lawson ~ 5 Comments

Security Focus just posted this interview of me, talking about DRM. Here are a few choice quotes.

On authoring software protection for Vista:

The rules of the game are changing recently with Microsoft Vista kernel patch protection. If you’re a rootkit author, you just bypass it. If you’re a software protection designer, you have to play by its rules. For the first time in the PC’s history, it’s not a level playing field any more. Virus scanner authors were the first to complain about this, and it will be interesting to see how this fundamental change affects the balance of power in the future.

On using custom hardware for protection:

Custom hardware often gives you a longer period until the first break since it requires an attacker’s time and effort to get up to speed on it. However, it often fails more permanently once cracked since the designers put all their faith in the hardware protection.

Anti-debugging: using up a resource versus checking it

May 21, 2008May 22, 2008 ~ Nate Lawson ~ 7 Comments

After my last post claiming anti-debugging techniques are overly depended on, various people asked how I would improve them. My first recommendation is to go back to the drawing board and make sure all your components of your software protection design hang together in a mesh. In other words, each has the property that the system will fail closed if a given component is circumvented in isolation. However, anti-debugging does have a place (along other components including obfuscation, integrity checks, etc.) so let’s consider how to implement it better.

The first principle of implementing anti-debugging is to use up a resource instead of checking it. The initial idea most novices have is to check a resource to see if a debugger is present. This might be calling IsDebuggerPresent() or if more advanced, using GetThreadContext() to read the values of hardware debug registers DR0-7. Then, the implementation is often as simple as “if (debugged): FailNoisily()” or if more advanced, a few inline versions of that check function throughout the code.

The problem with this approach is it provides a choke point for the attacker, a single bottleneck that circumvents all instances of a check no matter where they appear in the code or how obfuscated they are. For example, IsDebuggerPresent() reads a value from the local memory PEB struct. So attaching a debugger and then setting that variable to zero circumvents all uses of IsDebuggerPresent(). Looking for suspicious values in DR0-7 means the attacker merely has to hook GetThreadContext() and return 0 for those registers. In both cases, the attacker could also skip over the check so it always returned “OK”.

Besides the API bottleneck, anti-debugging via checking is fundamentally flawed as suffering from “time of check, time of use“, aka race conditions. An attacker could quickly swap out the values or virtualize accesses to the particular resources in order to lie about their state. Meanwhile, they could still be using the debug registers for their own purposes while stubbing out GetThreadContext(), for example.

Contrast this with anti-debugging by using up a resource. For example, a timer callback could be set up to vector through PEB!BeingDebugged. (Astute readers will notice this is a byte-wide field, so it would actually have to be an offset into a jump table). This timer would fire normally until a debugger was attached. Then the OS would store a non-zero value in that location, overwriting the timer callback vector. The next time the timer fired, the process would crash. Even if an attacker poked the value to zero, it would still crash. They would have to find another route to read the magic value in that location to know what to store there after attaching the debugger. To prevent merely disabling the timer, the callback should perform some essential operation to keep the process operating.

Now this is a simple example and there are numerous ways around it, but it illustrates my point. If an application uses a resource that the attacker also wants to use, it’s harder to circumvent than a simple check of the resource state.

An old example of this form of anti-debugging in the C64 was storing loader code in screen memory. As soon as the attacker broke into a debugger, the prompt would overwrite the loader code. Resetting the machine would also overwrite screen memory, leaving no trace of the loader. Attackers had to realize this and jump to a separate routine elsewhere in memory that made a backup copy of screen memory first.

For x86 hardware debug registers, a program could keep an internal table of functions but do no compile-time linkage. Each site of a function call would be replaced with code to store an execute watchpoint for EIP+4 in a debug register and a “JMP SlowlyCrash” following that. Before executing the JMP, the CPU would trigger a debug exception (INT1) since the EIP matched the watchpoint. The exception handler would look up the calling EIP in a table, identify the real target function, set up the stack, and return directly there. The target could then return back to the caller. All four debug registers should be utilized to avoid leaving room for the attacker’s breakpoints.

There are a number of desirable properties for this approach. If the attacker executes the JMP, the program goes off into the weeds but not immediately. As soon as they set a hardware breakpoint, this occurs since the attacker’s breakpoint overwrites the function’s watchpoint and the JMP is executed. The attacker would have to write code that virtualized calls to get and set DR0-7 and performed virtual breakpoints based on what the program thought should be the correct values. Such an approach would work, but would slow down the program enough that timing checks could detect it.

I hope these examples make the case for anti-debugging via using up a resource versus checking it. This is one implementation approach that can make anti-debugging more effective.

[Followup: Russ Osterlund has pointed out in the comments section why my first example is deficient. There’s a window of time between CreateProcess(DEBUG_PROCESS) and the first instruction execution where the kernel has already set PEB!BeingDebugged but the process hasn’t been able to store its jump table offset there yet. So this example is only valid for detecting a debugger attach after startup and another countermeasure is needed to respond to attaching beforehand. Thanks, Russ!]

Are single-purpose devices the solution to malware?

May 12, 2008May 12, 2008 ~ Nate Lawson ~ 1 Comment

I recently watched this fascinating talk by Jonathan Zittrain, author of The Future of the Internet — And How to Stop It. He covers everything from the Apple II to the iPhone and trusted computing. His basic premise is that malware is driving the resurgence of locked-down, single-purpose devices.

I disagree with that conclusion. I think malware will always infect the most valuable platform. If the iPhone was as widely-deployed as Windows PCs, you can bet people would be targeting it with keyloggers, closed platform or not. In fact, the motivation of people to find ways around the vendor’s protection on their own phone leads to a great malware channel (trojaned jailbreak apps, anyone?)

However, I like his analysis of what makes some open systems resilient (his example: Wikipedia defacers) and some susceptible to being gamed (Digg users selling votes). He claims it’s a matter of how much members consider themselves a part of the system versus outside it. I agree that designing in aspects of accountability and aggregated reputation help, whereas excessive perceived anonymity can lead to antisocial behavior.

Designing and Attacking DRM talk slides

April 11, 2008April 11, 2008 ~ Nate Lawson ~ 4 Comments

I gave a talk this morning at RSA 2008 on “Designing and Attacking DRM” (pdf of slides). It was pretty wide-ranging, covering everything from how to design good DRM to the latest comparison of Blu-ray protection, AACS vs. BD+. Those interested in the latter should see the last few slides, especially with news that AACS MKBv7 (appearing on retail discs later this month) has already been broken by Slysoft.

The timeline slide (page 25) is an attempt to capture the history of when discs were released into an unbroken environment or not. You want the line to be mostly green, with a few brief red segments here and there. AACS so far has had the inverse, where it is a long red line with a brief segment of green (a couple weeks in the past year and a half).

I also introduced two variables for characterizing the long-term success of a DRM system, L and T. That is, how long each update survives before being hacked (L), and how frequently updates appear (T).

In the case of AACS, L has been extremely short (if you discard the initial 8-month adoption period). Out of three updates, two have been broken before they were widely-available and one was broken a few weeks after release.

Additionally, T has been extremely long for AACS. Throwing out the initial year it took to get the first MKB update (v3), they’ve been following an approximate schedule of one every 6 months. That is much too long in a software player environment. I don’t know any vendor of a popular win32 game that would expect it to remain uncracked for 6 months, for example.

Of course, people in glass houses should not throw rocks. As someone who had a part in developing BD+, I am biased toward thinking a different approach than mere broadcast encryption is the only thing that has a chance of success in this rough world. The first BD+ discs were cracked in mid-March, and it remains to be seen how effective future updates will be. Unfortunately, I can’t comment on any details here. We’ll just have to watch and see how things work out the rest of this year.

2008 will prove whether a widely deployed scheme based on software protection is ultimately better or equivalent to the AACS approach. I have a high degree of confidence it will survive in the long run, both with longer L and shorter T than the alternative.

Wii hacking and the Freeloader

March 25, 2008March 25, 2008 ~ Nate Lawson

tmbinc wrote a great post describing the history of hacking the Wii and why certain holes were not publicized. This comes on the heels of Datel releasing a loader that can be used to play copied games by exploiting an RSA signature verification bug. I last heard of Datel when they made the Action Replay debug cartridge for the C64, and it looks like they’ve stayed around, building the same kind of thing for newer platforms.

First, the hole itself is amazingly bad for a widely-deployed commercial product. I wrote a long series of articles with Thomas Ptacek a year ago on how RSA signature verification requires careful padding checks. You might want to re-read that to understand the background. However, the Wii bug is much worse. The list of flaws includes:

Using strncmp() instead of memcmp() to compare the SHA hash
The padding is not checked at all

The first bug is fatal by itself. As soon as a terminating nul byte is reached, strncmp() returns. As long as the hash matched up to that point, the result would be success. If the first byte was nul, no comparison would be done and the check would pass.

It’s easy to create a chunk of data that hashes to a leading 0x00 byte. Here’s some sample code:

a = "rdist security blog"
import binascii, hashlib
for i in range(256):
    h = hashlib.sha1(chr(i)+a).digest()
    if ord(h[0]) == 0:
        print 'Found match with pad byte', i
        print 'SHA1:', "".join([binascii.b2a_hex(x) for x in h])
        break
else:
    print 'No pre-image found, try increasing the range.'

I got the following for my choice of string:

Found match with pad byte 80
SHA1: 00d50719c58e45c485e7d497e4021b48d814df33

The second bug is more subtle to exploit, but would still be open if only the strncmp() was fixed. It is well-known that if only 1/3 of the modulus length is validated, forgeries can be generated. If only 2/3 of the modulus length is validated, existential forgeries can be found. It would take another series of articles to explain all this, so see the citations of the original article for more detail.

tmbinc questions Datel’s motive in releasing an exploit for this bug. He and his team kept it secret in order to keep it usable to explore the system to find deeper flaws. Since it was easily patchable in software, it would be quickly closed. It turns out Nintendo fixed it two weeks after the Datel product became available.

I am still amazed how bad this hole was. Since such an important component failed open, it’s clear higher assurance development techniques are needed for software protection and crypto. I continue to do research in this area and hope to be able to publish more about it this year.

Apple iPhone bootloader attack

March 17, 2008March 19, 2008 ~ Nate Lawson ~ 7 Comments

News and details of the first iPhone bootloader hack appeared today. My analysis of the publicly-available details released by the iPhone Dev Team is that this has nothing to do with a possible new version of the iPhone, contrary to Slashdot. It involves exploiting low-level software, not the new SDK. It is a good example how systems built from a series of links in a chain are brittle. Small modifications (in this case, a patch of a few bytes) can compromise this kind of design, and it’s hard to verify that all links have no such flaws.

A brief disclaimer: I don’t own an iPhone nor have I seen all the details of the attack. So my summary may be incomplete, although the basic principles should be applicable. My analysis is also completely based on published details, so I apologize for any inaccuracies.

For those who are new to the iPhone architecture, here’s a brief recap of what hackers have found and published. The iPhone has two CPUs of interest, the main ARM11 applications processor and an Infineon GSM processor. Most hacks up until now have involved compromising applications running on the main CPU to load code (aka “jailbreak”). Then, using that vantage point, the attacker will run a flash utility (say, “bbupdater”) to patch the GSM CPU to ignore the type of SIM installed and unlock the phone to run on other networks.

As holes have been found in the usermode application software, Apple has released firmware updates that patch them. This latest attack is a pretty big advance in that now a software attack can fully compromise the bootloader, which provides lower-level control and may be harder to patch.

The iPhone boot sequence, according to public docs, is as follows. The ARM CPU begins executing a secure bootloader (probably in ROM) on power-up. It then starts a low-level bootloader (“LLB”), which then runs the main bootloader, “iBoot”. The iBoot loader starts the OSX kernel, which then launches the familiar Unix usermode environment. This appears to be a traditional chain-of-trust model, where each element verifies the next element is trusted and then launches it.

Once one link of this chain is compromised, it can fully control all the links that follow it. Additionally, since developers may assume all links of the chain are trusted, they may not protect upstream elements from potentially malicious downstream ones. For example, the secure bootloader might not protect against malicious input from iBoot if part or all of it remains active after iBoot is launched.

This new attack takes advantage of two properties of the bootloader system. The first is that NOR flash is trusted implicitly. The other is that there appears to be an unauthenticated system for patching the secure bootloader.

There are two kinds of flash in the iPhone: NOR and NAND. Each has different properties useful to embedded designers. NOR flash is byte-addressable and thus can be directly executed. However, it is more costly and so usually much smaller than NAND. NAND flash must be accessed via a complicated series of steps and only in page-size chunks. However, it is much cheaper to manufacture in bulk, and so is used as the 4 or 8 GB main storage in the iPhone. The NOR flash is apparently used as a kind of cache for applications.

The first problem is that software in the NOR flash is apparently unsigned. In fact, the associated signature is discarded as verified software is written to flash. So if an attacker can get access to the flash chip pins, he can just store unsigned applications there directly. However, this requires opening up the iPhone and so a software-only attack is more desirable. If there is some way to get an unsigned application copied to NOR flash, then it is indistinguishable from a properly verified app and will be run by trusted software.

The second problem is that there is a way to patch parts of the secure bootloader before iBoot uses them. It seems that the secure bootloader acts as a library for iBoot, providing an API for verifying signatures on applications. During initialization, iBoot copies the secure bootloader to RAM and then performs a series of fix-ups for function pointers that redirect back into iBoot itself. This is a standard procedure for embedded systems that work with different versions of software. Just like in Windows when imports in a PE header are rebased, iBoot has a table of offsets and byte patches it applies to the secure bootloader before calling it. This allows a single version of the secure bootloader in ROM to be used with ever-changing iBoot revisions since iBoot has the intelligence to “fix up” the library before using it.

The hackers have taken advantage of this table to add their own patches. In this case, the patch is to disable the “is RSA signature correct?” portion of the code in the bootloader library after it’s been copied to RAM. This means that the function will now always return OK, no matter what the signature actually is.

There are a number of ways this attack could have been prevented. The first is to use a mesh-based design instead of a chain with a long series of links. This would be a major paradigm shift, but additional upstream and downstream integrity checks could have found that the secure bootloader had been modified and was thus untrustworthy. This would also catch attackers if they used other means to modify the bootloader execution, say by glitching the CPU as it executed.

A simpler patch would be to include self-tests to be sure everything is working. For example, checking a random, known-bad signature at various times during execution would reveal that the signature verification routine had been modified. This would create multiple points that would need to be found and patched out by an attacker, reducing the likelihood that a single, well-located glitch is sufficient to bypass signature checking. This is another concrete example of applying mesh principles to security design.

Hackers are claiming there’s little or nothing Apple can do to counter this attack. It will be interesting to watch this as it develops and see if Apple comes up with a clever response.

Finally, if you find this kind of thing fascinating, be sure to come to my talk “Designing and Attacking DRM” at RSA 2008. I’ll be there all week so make sure to say “hi” if you will be also.