TPM hardware attacks (part 2)

Previously, I described a recent attack on TPMs that only requires a short piece of wire. Dartmouth researchers used it to reset the TPM and then insert known-good hashes in the TPM’s PCRs. The TPM version 1.2 spec has changes to address such simple hardware attacks.

It takes a bit of work to piece together the 1.2 changes since they aren’t all in one spec. The TPM 1.2 changes spec introduces the concept of “locality”, the LPC 1.1 spec describes new firmware messages, and other information available from Google show how it all fits together.

In the TPM 1.1 spec, the PCRs were reset when the TPM was reset, and software could write to them on a “first come, first served” basis. However, in the 1.2 spec, setting certain PCRs requires a new locality message. Locality 4 is only active in a special hardware mode. This special hardware mode corresponds in the PC architecture to the SENTER instruction.

Intel SMX (now “TXT”, formerly “LT”) adds a new instruction called SENTER. AMD has a similiar instruction called SKINIT. This instruction performs the following steps:

  1. Load a module into RAM (usually stored in the BIOS)
  2. Lock it into cache
  3. Verify its signature
  4. Hash the module into a PCR at locality 4
  5. Enable certain new chipset registers
  6. Begin executing it

This authenticated code (AC) module then hashes the OS boot loader into a PCR at locality 3, disables the special chipset registers, and continues the boot sequence. Each time the locality level is lowered, it can’t be raised again. This means the AC module can’t overwrite the locality 4 hash and the boot loader can’t overwrite the locality 3 hash.

Locality is implemented in hardware by the chipset using the new LPC firmware commands to encapsulate messages to the TPM. Version 1.1 chipsets will not send those commands. However, a man-in-the-middle device can be built with a simple microcontroller attached to the LPC bus. While more complex than a single wire, it’s well within range of modchip manufacturers.

This microcontroller would be attached to the clock, frame, and 4-bit address/data bus, 6 lines in total. While the LPC bus is idle, this device could drive the frame and A/D lines to insert a locality 4 “reset PCR” message. Malicious software could then load whatever value it wanted into the PCRs. No one has implemented this attack as far as I know, but it has been discussed numerous times.

What is the TCG going to do about this? Probably nothing. Hardware attacks are outside their scope, at least according to their documents.

“The commands that the trusted process sends to the TPM are the normal TPM commands with a modifier that indicates that the trusted process initiated the command… The assumption is that spoofing the modifier to the TPM requires more than just a simple hardware attack, but would require expertise and possibly special hardware.”

— Proof of Locality (section 16)

This shows why drawing an arbitrary attack profile and excluding anything that is outside it often fails. Too often, the list of excluded attacks does not realistically match the value of the protected data or underestimates the cost to attackers.

In the designers’ defense, any effort to add tamper-resistance to a PC is likely to fall short. There are too many interfaces, chips, manufacturers, and use cases involved. In a closed environment like a set-top box, security can be designed to match the only intended use for the hardware. With a PC, legacy support is very important and no single party owns the platform, despite the desires of some companies.

It will be interesting to see how TCPA companies respond to the inevitable modchips, if at all.

TPM hardware attacks

Trusted Computing has been a controversial addition to PCs since it was first announced as Palladium in 2002. Recently, a group at Dartmouth implemented an attack first described by Bernhard Kauer earlier this year. The attack is very simple, using only a 3-inch piece of wire. As with the Sharpie DRM hack, people are wondering how a system designed by a major industry group over such a long period could be so easily bypassed.

The PC implementation of version 1.1 of the Trusted Computing architecture works as follows. The boot ROM and then BIOS are the first software to run on the CPU. The BIOS stores a hash of the boot loader in the TPM’s PCR before executing it. A TPM-aware boot loader hashes the kernel, appends that value to the PCR, and executes the kernel. This continues on down the chain until the kernel is hashing individual applications.

How does software know it can trust this data? In addition to reading the SHA-1 hash from the PCR, it can ask the TPM to sign the response plus a challenge value using an RSA private key. This allows the software to be certain it’s talking to the actual TPM and no man-in-the-middle is lying about the PCR values. If it doesn’t verify this signature, it’s vulnerable to this MITM attack.

As an aside, the boot loader attack announced by Kumar et al isn’t really an attack on the TPM. They apparently patched the boot loader (a la eEye’s BootRoot) and then leveraged that vantage point to patch the Vista kernel. They got around Vista’s signature check routines by patching them to lie and always say “everything’s ok.” This is the realm of standard software protection and is not relevant to discussion about the TPM.

How does the software know that another component didn’t just overwrite the PCRs with spoofed but valid hashes? PCRs are “extend-only,” meaning they only add new values to the hash chain, they don’t allow overwriting old values. So why couldn’t an attacker just reset the TPM and start over? It’s possible a software attack could cause such a reset if a particular TPM was buggy, but it’s easier to attack the hardware.

The TPM is attached to a very simple bus known as LPC (Low Pin Count). This is the same bus used for Xbox1 modchips. This bus has a 4-bit address/data bus, 33 MHz clock, frame, and reset lines. It’s designed to host low-speed peripherals like serial/parallel ports and keyboard/mouse.

The Dartmouth researchers simply grounded the LPC reset line with a short wire while the system was running. From the video, you can see that the fan control and other components on the bus were also reset along with the TPM but the system keeps running. At this point, the PCRs are clear, just like at boot.  Now any software component could store known-good hashes in the TPM, subverting any auditing.

This particular attack was known before the 1.1 spec was released and was addressed in version 1.2 of the specifications. Why did it go unpatched for so long? Because it required non-trivial changes in the chipset and CPU that still aren’t fully deployed.

Next time, we’ll discuss a simple hardware attack that works against version 1.2 TPMs.

Hypervisor rootkit detection strategies

Keith Adams of VMware has a blog where he writes about his experiences virtualizing x86. In a well-written post, he discusses resource utilization techniques for detecting a hypervisor rootkit, including the TLB method described in his recent HotOS paper (alternate link).

We better find a way to derail Keith before he brainstorms any more of our techniques, although we have a reasonable claim that a co-author has published on TLB usage first. :-) Good thing side channels in an environment as complex as the x86 hardware interface are limitless!

Undetectable hypervisor rootkit challenge

I’m starting to get some queries about the challenge Tom, Peter, and I issued to Joanna. In summary, we’ll be giving a talk at Blackhat showing how hypervisor-based rootkits are not invisible and the detector always has the fundamental advantage. Joanna’s work is very nice, but her claim that hypervisor rootkits are “100% undetectable” is simply not true. We want to prove that with code, not just words.

Joanna recently responded. In summary, she agrees to the challenge with the following caveats:

  • We can’t intentionally crash or halt the machine while scanning
  • We can’t consume more than 90% of the CPU for more than a second
  • We need to supply five new laptops, not two
  • We both provide source code to escrow before the challenge and it is released afterwards
  • We pay her $416,000

The first two requirements are easy to agree to. Of course, the rootkit also shouldn’t do either of those or it is trivially detectable by the user.

Five laptops? Sure, ok. The concern is that even a random guess could be right with 50% probability. She is right that we can make guessing a bad strategy by adding more laptops. But we can also do the same by just repeating the test several times. Each time we repeat the challenge, the probability that we’re just getting lucky goes down significantly. After five runs, the chance that we guessed correctly via random guesses is only 3%, the baseline she established for acceptability. But if she wants five laptops instead, that’s fine too.

I don’t have a problem open-sourcing the code afterwards. However, I don’t see why it’s necessary either. We can detect her software without knowing exactly how it’s implemented. That’s the point.

The final requirement is not surprising. She claims she has put four person-months work into the current Blue Pill and it would require twelve more person-months for her to be confident she could win the challenge. Additionally, she has all the experience of developing Blue Pill for the entire previous year.

We’ve put about one person-month into our detector software and have not been paid a cent to work on it. However, we’re confident even this minimal detector can succeed, hence the challenge. Our Blackhat talk will describe the fundamental principles that give the detector the advantage.

If Joanna’s time estimate is correct, it’s about 16 times harder to build a hypervisor rootkit than to detect it. I’d say that supports our findings.

[Edit: corrected the cost calculation from $384,000]

Fault-tolerant system design

When designing a system, I have a hierarchy of properties. It is somewhat fluid depending on the application but roughly breaks down as follows.

  1. Correct: must perform the basic required function
  2. Robust: … under less-than-perfect operating conditions
  3. Reliable: … handling and recovering from internal errors
  4. Secure: … in the face of a malicious adversary
  5. Performance: … as fast as possible

I’d like to talk about how to achieve reliability. Reliability is the property where the system recovers from out-of-spec conditions and implementation errors with minimal impact on the other properties (i.e., correctness, security). While most system designers have an idea of the desired outcome, surprisingly few have a strategy for getting there. Designing for reliability also produces a system that is easier to secure (ask Dan Bernstein.)

Break down your design into logical components

This should go without saying, but if you have a monolithic design, fault recovery becomes very difficult. If there is an implicit linkage between components, over time it will grow explicit. Corollary: const usage only decreases over time in a fielded system, never increases.

Each component of the system should be able to reset independently or in groups

As you break down the system into components, consider what dependencies a reset of each component triggers. A simple module with less dependencies is best. I use the metric of “cross-section” to describe the complexity of inter-module dependencies.

Implement reset in terms of destructors/constructors

Right from the beginning, implement reset. Since you’re already coding the constructors and should have been coding destructors (right?), reset should be easy. If possible, use reset as part of the normal initialization process to be sure it doesn’t become dead code.

Increase the speed of each component’s recovery and make them
restart independently if possible

If components take a long time to recover, it may result in pressure from customers or management to ditch this feature. A component should never take longer to recover than a full system reset, otherwise rethink its design. Independence means that reset can proceed in parallel, which also increases performance.

Add a rollback feature to components

In cases where a full reset results in loss of data or takes too long, rollback may be another option. Are there intermediate states that can be safely reverted while keeping the component running?

Add error-injection features to test fault recovery

Every component should have a maintenance interface to inject faults. Or, design a standard test framework that does this. At the very least, it should be possible to externally trigger a reset of every component. It’s even better to allow the tester to inject errors in components to see if they detect and recover properly.

Instrument components to get a good report of where fault appeared

A system is only as debuggable as its visibility allows. A lightweight trace generation feature (e.g., FreeBSD KTR) can give engineers the information needed to diagnose faults. It should be fast enough to always be available, not just in debug builds.

Taxonomy of glitch and side channel attacks

There are a number of things to try when developing such attacks, depending on the device and countermeasures present. We’ll assume that the attacker has possession of several instances of the device and a moderate budget. This limits an attacker to non-invasive and slightly invasive methods.

Timing attacks work at the granularity of entire device operations (request through result) and don’t require any hardware tools. However, hardware may be used to acquire timing information, for example, by using an oscilloscope and counting the clock cycles an operation takes. I call this observation point external since only information about the entire operation (not its intermediate steps) is available. All software, including commonly used applications or operating systems, need to be aware of timing attacks when working with secrets. The first published timing attack was against RSA, but any kind of CPU access to secret data can reveal information about that data (e.g., cache misses.)

A common misconception is that noise alone can prevent timing attacks. Boneh et al disproved this handily when they mounted timing attacks against OpenSSL over a WAN. If there is noise, just take more measurements. Since noise is random but the key is constant, noise tends to average out the greater your sample size.

Power, EM, thermal, and audio side channel attacks measure more detailed internal behavior throughout an operation. If the intermediate state of an operation is visible in a timing attack, I classify it as an internal side channel attack as well (e.g., Percival’s cache timing attack.) The granularity of measurement is important. Thus, thermal and audio attacks are less powerful given the slow response of the signal compared to the speed of the computation. In other words, they have built-in averaging.

Simple side channel attacks (i.e. SPA) involve observing differences of behavior within a single sample. The difference in height of the peaks of power consumption during a DES operation might indicate the number of 1 bits in the key for that particular round. Since most crypto is based on an iterative model, similarities and differences between each iteration directly reflect the secret data being processed.

Differential side channel attacks (i.e. DPA) are quite a bit different. Instead of requiring an observable, repeatable difference in behavior, any slight variation in behavior can be leveraged using statistics and knowledge of cipher structure. It would take an entire series of articles to explain the various forms of DPA, but I’ll summarize by saying that DPA can automatically extract keys from traces that individually appear completely random.

Glitch attacks (aka fault induction) involve deliberately inducing an error in hardware behavior. They are usually non-invasive but occasionally partially invasive. If power lines are accessible, the power supply can be subjected to a momentary excessive voltage or a brown-out. Removing decoupling capacitors can magnify this effect. If IO lines are accessible, they can be subjected to high-frequency analog signals in an attempt to corrupt the logic behind the IO buffer. But usually these approaches can be prevented by careful engineering.

Most glitch attacks use the clock line since it is especially critical to chip operation. In addition to over-voltage, complex high-frequency waveforms can induce interesting behavior. Flip-flops and latches have a timing parameter called “setup and hold” which indicates how long a 0 or 1 bit needs to be applied before the hardware can remember the bit. High frequency waveforms at the edge of this limit cause some flip-flops to register a new value (possibly random) and others to keep their old value. Natural manufacturing variances mean this is impossible to prevent. Pulse, triangle, and sawtooth waveforms provide more possibilities for variation.

Optical and EM glitch attacks induce faults using radiation. Optical attacks are partially invasive in that the chip has to be partially removed from its package (decapping). EM attacks can usually penetrate the housing. The nice thing about this glitching approach is that individual areas of the chip can be targeted, like RAM which is particularly vulnerable to bit flips. Optical attacks can be done using a flash bulb or laser pointer.

With more resources, tools like FIB workstations become available. These allow for fully invasive attacks, where the silicon is modified or tapped at various places to extract information or induce insecure behavior. Such tools are available (Ross Anderson’s group has been using one since the mid 90’s) but are generally not used by the hobbyist hacker community.