Wii hacking and the Freeloader

tmbinc wrote a great post describing the history of hacking the Wii and why certain holes were not publicized. This comes on the heels of Datel releasing a loader that can be used to play copied games by exploiting an RSA signature verification bug. I last heard of Datel when they made the Action Replay debug cartridge for the C64, and it looks like they’ve stayed around, building the same kind of thing for newer platforms.

First, the hole itself is amazingly bad for a widely-deployed commercial product. I wrote a long series of articles with Thomas Ptacek a year ago on how RSA signature verification requires careful padding checks. You might want to re-read that to understand the background. However, the Wii bug is much worse. The list of flaws includes:

  1. Using strncmp() instead of memcmp() to compare the SHA hash
  2. The padding is not checked at all

The first bug is fatal by itself. As soon as a terminating nul byte is reached, strncmp() returns. As long as the hash matched up to that point, the result would be success. If the first byte was nul, no comparison would be done and the check would pass.

It’s easy to create a chunk of data that hashes to a leading 0x00 byte. Here’s some sample code:

a = "rdist security blog"
import binascii, hashlib
for i in range(256):
    h = hashlib.sha1(chr(i)+a).digest()
    if ord(h[0]) == 0:
        print 'Found match with pad byte', i
        print 'SHA1:', "".join([binascii.b2a_hex(x) for x in h])
        break
else:
    print 'No pre-image found, try increasing the range.'

I got the following for my choice of string:

Found match with pad byte 80
SHA1: 00d50719c58e45c485e7d497e4021b48d814df33

The second bug is more subtle to exploit, but would still be open if only the strncmp() was fixed. It is well-known that if only 1/3 of the modulus length is validated, forgeries can be generated. If only 2/3 of the modulus length is validated, existential forgeries can be found. It would take another series of articles to explain all this, so see the citations of the original article for more detail.

tmbinc questions Datel’s motive in releasing an exploit for this bug. He and his team kept it secret in order to keep it usable to explore the system to find deeper flaws. Since it was easily patchable in software, it would be quickly closed. It turns out Nintendo fixed it two weeks after the Datel product became available.

I am still amazed how bad this hole was. Since such an important component failed open, it’s clear higher assurance development techniques are needed for software protection and crypto. I continue to do research in this area and hope to be able to publish more about it this year.

Apple iPhone bootloader attack

News and details of the first iPhone bootloader hack appeared today. My analysis of the publicly-available details released by the iPhone Dev Team is that this has nothing to do with a possible new version of the iPhone, contrary to Slashdot. It involves exploiting low-level software, not the new SDK. It is a good example how systems built from a series of links in a chain are brittle. Small modifications (in this case, a patch of a few bytes) can compromise this kind of design, and it’s hard to verify that all links have no such flaws.

A brief disclaimer: I don’t own an iPhone nor have I seen all the details of the attack. So my summary may be incomplete, although the basic principles should be applicable. My analysis is also completely based on published details, so I apologize for any inaccuracies.

For those who are new to the iPhone architecture, here’s a brief recap of what hackers have found and published. The iPhone has two CPUs of interest, the main ARM11 applications processor and an Infineon GSM processor. Most hacks up until now have involved compromising applications running on the main CPU to load code (aka “jailbreak”). Then, using that vantage point, the attacker will run a flash utility (say, “bbupdater”) to patch the GSM CPU to ignore the type of SIM installed and unlock the phone to run on other networks.

As holes have been found in the usermode application software, Apple has released firmware updates that patch them. This latest attack is a pretty big advance in that now a software attack can fully compromise the bootloader, which provides lower-level control and may be harder to patch.

The iPhone boot sequence, according to public docs, is as follows. The ARM CPU begins executing a secure bootloader (probably in ROM) on power-up. It then starts a low-level bootloader (“LLB”), which then runs the main bootloader, “iBoot”. The iBoot loader starts the OSX kernel, which then launches the familiar Unix usermode environment. This appears to be a traditional chain-of-trust model, where each element verifies the next element is trusted and then launches it.

Once one link of this chain is compromised, it can fully control all the links that follow it. Additionally, since developers may assume all links of the chain are trusted, they may not protect upstream elements from potentially malicious downstream ones. For example, the secure bootloader might not protect against malicious input from iBoot if part or all of it remains active after iBoot is launched.

This new attack takes advantage of two properties of the bootloader system. The first is that NOR flash is trusted implicitly. The other is that there appears to be an unauthenticated system for patching the secure bootloader.

There are two kinds of flash in the iPhone: NOR and NAND. Each has different properties useful to embedded designers. NOR flash is byte-addressable and thus can be directly executed. However, it is more costly and so usually much smaller than NAND. NAND flash must be accessed via a complicated series of steps and only in page-size chunks. However, it is much cheaper to manufacture in bulk, and so is used as the 4 or 8 GB main storage in the iPhone. The NOR flash is apparently used as a kind of cache for applications.

The first problem is that software in the NOR flash is apparently unsigned. In fact, the associated signature is discarded as verified software is written to flash. So if an attacker can get access to the flash chip pins, he can just store unsigned applications there directly. However, this requires opening up the iPhone and so a software-only attack is more desirable. If there is some way to get an unsigned application copied to NOR flash, then it is indistinguishable from a properly verified app and will be run by trusted software.

The second problem is that there is a way to patch parts of the secure bootloader before iBoot uses them. It seems that the secure bootloader acts as a library for iBoot, providing an API for verifying signatures on applications. During initialization, iBoot copies the secure bootloader to RAM and then performs a series of fix-ups for function pointers that redirect back into iBoot itself. This is a standard procedure for embedded systems that work with different versions of software. Just like in Windows when imports in a PE header are rebased, iBoot has a table of offsets and byte patches it applies to the secure bootloader before calling it. This allows a single version of the secure bootloader in ROM to be used with ever-changing iBoot revisions since iBoot has the intelligence to “fix up” the library before using it.

The hackers have taken advantage of this table to add their own patches. In this case, the patch is to disable the “is RSA signature correct?” portion of the code in the bootloader library after it’s been copied to RAM. This means that the function will now always return OK, no matter what the signature actually is.

There are a number of ways this attack could have been prevented. The first is to use a mesh-based design instead of a chain with a long series of links. This would be a major paradigm shift, but additional upstream and downstream integrity checks could have found that the secure bootloader had been modified and was thus untrustworthy. This would also catch attackers if they used other means to modify the bootloader execution, say by glitching the CPU as it executed.

A simpler patch would be to include self-tests to be sure everything is working. For example, checking a random, known-bad signature at various times during execution would reveal that the signature verification routine had been modified. This would create multiple points that would need to be found and patched out by an attacker, reducing the likelihood that a single, well-located glitch is sufficient to bypass signature checking. This is another concrete example of applying mesh principles to security design.

Hackers are claiming there’s little or nothing Apple can do to counter this attack. It will be interesting to watch this as it develops and see if Apple comes up with a clever response.

Finally, if you find this kind of thing fascinating, be sure to come to my talk “Designing and Attacking DRM” at RSA 2008. I’ll be there all week so make sure to say “hi” if you will be also.

Memory remanence attack analysis

You have probably heard by now of the memory remanence attack by Halderman et al. They show that it is easy to recover cryptographic keys from RAM after a reset or moving the DIMM to another system. This is important to any software that stores keys in RAM, and they targeted disk encryption. It’s a nice paper with a very polished but responsible publicity campaign, including a video.

Like most good papers, some parts of the attack were known for a long time and others were creative improvements. Memory remanence has been a known issue ever since the first key had to be zeroed after use. In the PC environment, the trusted computing efforts have been aware of this as well. (See “Hardware Attacks”, chapter 13 — S3 is suspend-to-ram and SCLEAN is a module that must be run during power-on to clear RAM). However, the Halderman team is publishing the first concrete results in this area and it should shake things up.

One outcome I do not want to see from this is a blind movement to closed hardware crypto (e.g., hard disk drives with onboard encryption). Such systems are ok in principle, but in practice often compromise security in more obvious ways than a warm reboot. For example, a hard drive that stores encryption keys in a special “lock sector” that the drive firmware won’t access without a valid password can be easily circumvented by patching the firmware. Such a system would be less secure in a cold power-on scenario than well-implemented software. The solution here is to ask vendors for documentation on their security implementation before making a purchase or only buy hardware that has been reviewed by a third-party with a report that matches your expectations. (Full disclosure: I perform this kind of review at Root Labs.)

Another observation is that this attack underlines the need to apply software protection techniques to other security applications besides DRM. If an attacker can dump your RAM, you need effective ways to hide the key in memory like white-box crypto, obfuscate and tamper-protect software that uses it, and randomize each install to prevent “break once, run everywhere” attacks. Yes, this is the exact same threat model DRM has faced for years but this time you care because you’re the target.

It will be interesting to see how vendors respond to this. Zeroing memory on reboot is an obvious change that addresses some of their methods. A more subtle hack is to set up page mapping and cache configuration such that the key is loaded into a cache line and never evicted (as done for fast IP routing table lookup in this awesome paper). However, none of this stops attacks that move the DIMM to another system. On standard x86 hardware, there’s no place other than RAM to put keys. However, the VIA C7 processors have hardware AES built into the CPU, and it’s possible more vendors will take this approach to providing secure key storage and crypto acceleration.

Whatever the changes, it will probably take a long time before this attack is effectively addressed. Set your encrypted volumes to auto-detach during suspend or a reasonable timeout and keep an eye on your laptop.

C64 screen memory and anti-debugging

I think it’s fun to stir your creativity periodically by analyzing old software protection schemes. I prefer the C64 because emulators are widely available, disks are cheap and easy to import, and it’s the system I became most familiar with as a kid.

One interesting anti-debugging trick was to load the protection code into screen memory. Just like on the PC, the data on your screen is just a series of values stored in memory accessible to the main CPU. On the C64, screen memory typically was located at 0x400 – 0x7FF. Data could be loaded into this region by setting the block addresses in the file’s directory entry (very simple version of shared library load address) or by explicitly storing it at that address using the serial load routines.

To keep users from seeing garbage, the foreground and background colors were set to be the same. If you tried to break into a debugger, the prompt would usually overwrite the protection code. This could be worked around by relocating actual screen memory (by reprogramming the VIC-II chip) or by manually loading the code at a different address and disassembling it.

This is an example of anti-debugging based on utilization of shared resources. The logic is that a debugger needs to use the screen to run, so if the protection is using that resource also, the attacker will disrupt the system by activating the debugger. It is usually much more effective to use up a shared resource than to just check for signs that a debugger is present, an approach that is still important today.

TPM hardware attacks (part 2)

Previously, I described a recent attack on TPMs that only requires a short piece of wire. Dartmouth researchers used it to reset the TPM and then insert known-good hashes in the TPM’s PCRs. The TPM version 1.2 spec has changes to address such simple hardware attacks.

It takes a bit of work to piece together the 1.2 changes since they aren’t all in one spec. The TPM 1.2 changes spec introduces the concept of “locality”, the LPC 1.1 spec describes new firmware messages, and other information available from Google show how it all fits together.

In the TPM 1.1 spec, the PCRs were reset when the TPM was reset, and software could write to them on a “first come, first served” basis. However, in the 1.2 spec, setting certain PCRs requires a new locality message. Locality 4 is only active in a special hardware mode. This special hardware mode corresponds in the PC architecture to the SENTER instruction.

Intel SMX (now “TXT”, formerly “LT”) adds a new instruction called SENTER. AMD has a similiar instruction called SKINIT. This instruction performs the following steps:

  1. Load a module into RAM (usually stored in the BIOS)
  2. Lock it into cache
  3. Verify its signature
  4. Hash the module into a PCR at locality 4
  5. Enable certain new chipset registers
  6. Begin executing it

This authenticated code (AC) module then hashes the OS boot loader into a PCR at locality 3, disables the special chipset registers, and continues the boot sequence. Each time the locality level is lowered, it can’t be raised again. This means the AC module can’t overwrite the locality 4 hash and the boot loader can’t overwrite the locality 3 hash.

Locality is implemented in hardware by the chipset using the new LPC firmware commands to encapsulate messages to the TPM. Version 1.1 chipsets will not send those commands. However, a man-in-the-middle device can be built with a simple microcontroller attached to the LPC bus. While more complex than a single wire, it’s well within range of modchip manufacturers.

This microcontroller would be attached to the clock, frame, and 4-bit address/data bus, 6 lines in total. While the LPC bus is idle, this device could drive the frame and A/D lines to insert a locality 4 “reset PCR” message. Malicious software could then load whatever value it wanted into the PCRs. No one has implemented this attack as far as I know, but it has been discussed numerous times.

What is the TCG going to do about this? Probably nothing. Hardware attacks are outside their scope, at least according to their documents.

“The commands that the trusted process sends to the TPM are the normal TPM commands with a modifier that indicates that the trusted process initiated the command… The assumption is that spoofing the modifier to the TPM requires more than just a simple hardware attack, but would require expertise and possibly special hardware.”

— Proof of Locality (section 16)

This shows why drawing an arbitrary attack profile and excluding anything that is outside it often fails. Too often, the list of excluded attacks does not realistically match the value of the protected data or underestimates the cost to attackers.

In the designers’ defense, any effort to add tamper-resistance to a PC is likely to fall short. There are too many interfaces, chips, manufacturers, and use cases involved. In a closed environment like a set-top box, security can be designed to match the only intended use for the hardware. With a PC, legacy support is very important and no single party owns the platform, despite the desires of some companies.

It will be interesting to see how TCPA companies respond to the inevitable modchips, if at all.

TPM hardware attacks

Trusted Computing has been a controversial addition to PCs since it was first announced as Palladium in 2002. Recently, a group at Dartmouth implemented an attack first described by Bernhard Kauer earlier this year. The attack is very simple, using only a 3-inch piece of wire. As with the Sharpie DRM hack, people are wondering how a system designed by a major industry group over such a long period could be so easily bypassed.

The PC implementation of version 1.1 of the Trusted Computing architecture works as follows. The boot ROM and then BIOS are the first software to run on the CPU. The BIOS stores a hash of the boot loader in the TPM’s PCR before executing it. A TPM-aware boot loader hashes the kernel, appends that value to the PCR, and executes the kernel. This continues on down the chain until the kernel is hashing individual applications.

How does software know it can trust this data? In addition to reading the SHA-1 hash from the PCR, it can ask the TPM to sign the response plus a challenge value using an RSA private key. This allows the software to be certain it’s talking to the actual TPM and no man-in-the-middle is lying about the PCR values. If it doesn’t verify this signature, it’s vulnerable to this MITM attack.

As an aside, the boot loader attack announced by Kumar et al isn’t really an attack on the TPM. They apparently patched the boot loader (a la eEye’s BootRoot) and then leveraged that vantage point to patch the Vista kernel. They got around Vista’s signature check routines by patching them to lie and always say “everything’s ok.” This is the realm of standard software protection and is not relevant to discussion about the TPM.

How does the software know that another component didn’t just overwrite the PCRs with spoofed but valid hashes? PCRs are “extend-only,” meaning they only add new values to the hash chain, they don’t allow overwriting old values. So why couldn’t an attacker just reset the TPM and start over? It’s possible a software attack could cause such a reset if a particular TPM was buggy, but it’s easier to attack the hardware.

The TPM is attached to a very simple bus known as LPC (Low Pin Count). This is the same bus used for Xbox1 modchips. This bus has a 4-bit address/data bus, 33 MHz clock, frame, and reset lines. It’s designed to host low-speed peripherals like serial/parallel ports and keyboard/mouse.

The Dartmouth researchers simply grounded the LPC reset line with a short wire while the system was running. From the video, you can see that the fan control and other components on the bus were also reset along with the TPM but the system keeps running. At this point, the PCRs are clear, just like at boot.  Now any software component could store known-good hashes in the TPM, subverting any auditing.

This particular attack was known before the 1.1 spec was released and was addressed in version 1.2 of the specifications. Why did it go unpatched for so long? Because it required non-trivial changes in the chipset and CPU that still aren’t fully deployed.

Next time, we’ll discuss a simple hardware attack that works against version 1.2 TPMs.