Mixed voltage interfacing for design or hacking

Modern digital systems involve a wide array of voltages. Instead of just the classic 5V TTL, they now use components and busses ranging from 3.3V down to 1.0V. Interfacing with these systems is tricky, especially when you have multiple power sources, capacitive loads, and inrush current from devices being powered on. It’s important to understand the internal design of all drivers, both in your board and the target, to be sure your final product is cheap but reliable.

When doing a security audit of an embedded device, interfacing is one of the main tasks. You have to figure out what signals you want to tap or manipulate. You pick a speed range and signal characteristics. For example, you typically want to sample at 4x the target data rate to ensure a clean signal or barring that, lock onto a clock to synchronize at 1x. For active attacks such as glitching, you often have to do some analog design for the pulse generator and trigger it from your digital logic. Most importantly, you have to make sure your monitoring board does not disrupt the system. The Xbox tap board (pdf) built by bunnie is a great case study for this, as well as his recent NeTV device.

Building a mass market board is different than a quick hack for a one-off review, and cost becomes a big concern. Sure, you can use an external level shifter or transceiver chip. However, these come with a lot of trade-offs. They add latency to switching times. They often require dual power supplies, which may not be available for your particular application. They usually require extra control pins (e.g., to select signal direction). They force you to switch a group of pins all at once, not individually. Most importantly, they add cost and take up board space. With a little more thought, you can often come up with a simpler solution.

There was a great article in Circuit Cellar on mixed-voltage digital interfacing. It goes into a lot of the different design approaches, from current-limiting resistors all the way up to BJTs and MOSFETs. The first step is to understand the internals of I/O pins on most digital devices. Below is a diagram I reproduced from the article.

CMOS I/O pin, showing internal ESD diodes

Whether you are using a microcontroller or logic chip, usually the I/O pins have a similar design. The input to the gate is protected by diodes that will conduct current to ground if the input voltage goes more negative than the GND pin or to Vcc if the input is more positive than Vcc. These diodes are normally used to protect against static discharge, but can also be a useful part of your design if you are careful.

The current limiting resistor can be calculated by figuring out the maximum and minimum voltages your input will see and the maximum current flow you want into your device. You also have to take into account the diode’s voltage drop. Here’s an example calculation:

Max input level: 6V
Vcc: 5V
Diode drop: 0.7V (typical)
Max current: 100 microamps

R = V / I = (6 – 5 – 0.7) / 0.0001 = 3K ohms

While microcontrollers and 74xx logic devices are often tolerant of some moderate current, FPGAs are extremely sensitive. If you’re using an FPGA, use a level shifter chip or you’ll be sorry. Also, most devices are more sensitive to negative voltage than positive. Even if you’re within the specs for max current, a negative voltage may quickly lead to latch-up.

Level shifters usually take in dual voltages and safely isolate both sides. Some are one-way, which can be used as line drivers or line receivers. Others are bidirectional transceivers, and the direction is selected by an extra pin. If you can’t afford the extra pins, you can often combine an open-collector transmitter with a line receiver. However, if you happen to drive the lines by transmitting at the same time as the other side, you can fry your chips.

To summarize, mixed voltage interfacing is a skill you’ll need when building your own devices or hacking existing ones. You can get by with just a current-limiting resistor in many cases, but be sure you understand the I/O pin design to avoid costly failures, especially in the field over time. For more assurance or with sensitive parts like FPGAs, you’ll have to use a level shifter.

The Magic Inside Bunnie’s New NeTV

A year ago, what was probably the most important Pastebin posting ever was released by an anonymous hacker. The HDCP master key gave the ability for anyone to derive the keys protecting the link between DVD players and TVs. There was no possibility of revocation. The only remaining question was, “who would be the first to deploy this key in an HDCP stripper?”

Last week, the HDCP master key was silently deployed, but surprisingly, not in a stripper or other circumvention device. Instead, it’s enabling a useful new system called the Chumby NeTV. It was created by Bunnie Huang, who is known for inventing the Chumby and hacking the Xbox. He’s driving down the cost of TV-connected hardware with a very innovative approach.

The NeTV displays Internet apps on your TV. You can see Twitter feeds, view photos, and browse the web via an on-screen display. It overlays this information on your video source. You can control it from your iPhone or Android phone. It’s simple to install since you merely plug it inline with your cable box or DVD player’s HDMI connection to the TV. And in true Bunnie fashion, the hardware and software is all open source.

When I first heard of this last week, I didn’t think much of it. It’s a neat concept, but I don’t have an HDTV. Then, a friend contacted me.

“Have you figured out how the NeTV works? There’s a lot of speculation, but I think I’ve figured it out,” he said. I told him I hadn’t thought much about it, then downloaded the source code to the FPGA to take a look.

I was surprised to find an entire HDCP implementation, but it didn’t quite make sense. There was no decryption block or device keys. I emailed Bunnie, asking how it could do alpha blending without decrypting the video. He wrote back from a plane in Tokyo with a cryptic message, “No decryption involved, just chroma key.”

This was the hint I needed. I went back and watched the demo video. The overlay was not transparent as I had first thought. It was opaque. To do alpha blending, you have to have plaintext video in order to mask off the appropriate bits and combine them. But to apply an opaque overlay, you could just overwrite the appropriate video locations with your substituted data. It would require careful timing, but no decryption.

Chroma key (aka “blue/green screen”) uses color for in-band signaling. Typically, an actor performs in front of a green screen. A computer (or a filter, in the old days) substitutes data from another feed wherever there is green. This is the foundation of most special effects in movies. Most importantly, it is simple and can be performed quickly with a minimum of logic.

The NeTV generates its output signal by combining the input video source and the generated overlay with this same technique. The overlay is mostly filled with pixels of an unusual color (Bunnie called it “magic pink”). The FPGA monitors the input signal position (vertical/horizontal sync, which aren’t encrypted) to know where it is within each frame of video. When it is within the pink region of the overlay, it just passes through the encrypted input video. Otherwise, it displays the overlay. The HDCP implementation is needed to encrypt the overlay, otherwise this part of the screen will be scrambled when the TV tries to decrypt it. But, indeed, there is no decryption of the input content.

This is impressive work, on par with the demoscene. The NeTV synchronizes with every frame of video, no jitter, choosing which pixel stream to output (and possibly encrypt) on-the-fly. But there’s more.

To generate the keystream, the NeTV has to synchronize with the HDCP key exchange between video source and TV. It replicates each step of the process so that it derives the correct stream key. To keep any timing issues with the main CPU from delaying the key exchange, it resets the link after deriving the shared key to be sure everything is aligned again. Since the transport key only depends on the two endpoint device keys, the same shared key is always used.

This is extremely impressive from a technical standpoint, but it’s also interesting from a content protection standpoint. The NeTV has no device keys of its own; it derives the ones in use by your video source and TV as needed. It never decrypts video, only encrypts its on-screen display to match. It can’t easily be turned into an HDCP stripper since that would require a lot of rework of the internals. (The Revue, with its HDMI transceiver chip and Atom processor could probably be turned into an HDCP stripper with a similar level of effort.)

Bunnie has done it again with a cheap device that applies his extensive creativity to not just solve a problem, but do it in style. Whatever the outcome of his maverick engineering is in the marketplace, the internals are a thing of beauty.

Memory address layout vulnerabilities

This post is about a programming mistake we have seen a few times in the field. If you live the TAOSSA, you probably already avoid this but it’s a surprisingly tricky and persistent bug.

Assume you’d like to exploit the function below on a 32-bit system. You control len and the contents of src, and they can be up to about 1 MB in size before malloc() or prior input checks start to error out early without calling this function.

int target_fn(char *src, int len)
{
    char buf[32];
    char *end;

    if (len < 0) return -1;
    end = buf + len;
    if (end > buf + sizeof(buf)) return -1;
    memcpy(buf, src, len);
    return 0;
}

Is there a flaw? If so, what conditions are required to exploit it? Hint: the obvious integer overflow using len is caught by the first if statement.

The bug is an ordinary integer overflow, but it is only exploitable in certain conditions. It depends entirely on the address of buf in memory. If the stack is located at the bottom of address space on your system or if buf was located on the heap, it is probably not exploitable. If it is near the top of address space as with most stacks, it may be exploitable. But it completely depends on the runtime location of buf, so exploitability depends on the containing program itself and how it uses other memory.

The issue is that buf + len may wrap the end pointer to memory below buf. This may happen even for small values of len, if buf is close enough to the top of memory. For example, if buf is at 0xffff0000, a len as small as 64KB can be enough to wrap the end pointer. This allows the memcpy() to become unbounded, up to the end of RAM. If you’re on a microcontroller or other system that allows accesses to low memory, memcpy() could wrap internally after hitting the top of memory and continue storing data at low addresses.

Of course, these kinds of functions are never neatly packaged in a small wrapper and easy to find. There’s usually a sea of them and the copy happens many function calls later, based on stored values. In this kind of situation, all of us (maybe even Mark Dowd) need some help sometimes.

There has been a lot of recent work on using SMT solvers to find boundary condition bugs. They are useful, but often limited. Every time you hit a branch, you have to add a constraint (or potentially double your terms, depending on the structure). Also, inferring the runtime contents of RAM is a separate and difficult problem.

We think the best approach for now is to use manual code review to identify potentially problematic sections, and then restrict the search space to that set of functions for automated verification. Despite some promising results, we’re still a long way from automated detection and exploitation of vulnerabilities. As the program verification field advances, additional constraints from ASLR, DEP, and even software protection measures reduce the ease of exploitation.

Over the next few years, it will be interesting to see if attackers can maintain their early lead by using program verification techniques. Microsoft has applied the same approach to defense, and it would be good to see this become general practice elsewhere.

Building the ZoomFloppy slides

At ECCC 2010, I presented these slides on the ZoomFloppy, a new device for accessing Commodore floppy drives from a PC via USB. The firmware, known as xum1541, has been available since fall 2009 for those who want to build their own board, but the ZoomFloppy is the first device that will be a complete product offered for sale. Jim Brain will be manufacturing and selling it by the end of the year.

The ZoomFloppy has a number of features beyond simple disk access, which is implemented in OpenCBM. It can also nibble protected disks using a parallel cable and nibtools. It is software-upgradeable and this presentation discusses some features that are planned for the future.

One surprising finding I made was that by running the 1571 drive in double-clocked (2 MHz mode), the hardware UART is just fast enough to enable transfer of raw bits, directly off the media. No one has every created a copier that took advantage of this “hidden” mode in the 25 years since the 1571 was introduced. Normally, this kind of transfer requires soldering a parallel cable into your drive. This mode works via the normal serial cable, but requires low-latency control of the bus that is only possible with a microcontroller (not DB25 printer port).

I also discuss how modern day piracy on the PS3 affected our chip supply and digress a bit to discuss old copy protection schemes. I hope you enjoy the presentation.

(Direct pdf download)

Another reason to MAC ciphertext, not plaintext

Colin Percival and I often disagree about how cryptography fits into software design. My opinion is that custom crypto designs always require intense third-party review, leaving companies with two undesirable possibilities. Either the review and maintenance cost of custom crypto increases the expense over competing design options that don’t require crypto (such as keeping critical state on the server) or leads to security flaws if third-party review is skipped. But we often agree on the concrete details, such as the preference for encrypt-then-MAC construction.

If you MAC the plaintext, the recipient must decrypt the message and then verify the MAC before rejecting it. An attacker can send garbage or carefully chosen ciphertexts and they can’t be rejected until they are decrypted. If the ciphertext is authenticated, then the server can reject the forged or modified message earlier.

This has a number of advantages, including some not emphasized in Colin’s post. He points out that this reduces the surface area of your code accessible to an attacker. This is especially true if the higher-level protocol is designed to minimize interpretation of unprotected header fields (and hopefully, not parsing anything unprotected). The earlier the MAC can be performed, the less code accessible to an attacker who is choosing messages.

Another advantage is that this helps prevent side channel attacks. If you decrypt and then validate the plaintext, an attacker can send chosen ciphertext and collect information from a side channel to recover the encryption key. However, if you reject such messages early, the decryption key is never used. The attacker is reduced to replaying valid messages seen previously, which limits (but doesn’t prevent) side channel attacks.

However, one point he gets wrong is the emphasis on counter mode (CTR) to prevent side channel attacks. A former colleague of mine, Josh Jaffe, published a paper on attacking counter mode at CHES 2007. The common lore before his paper was that counter mode limits side channel attacks because the attacker does not get to choose the plaintext input to the block cipher. Instead, a possibly-unknown counter value is fed in as plaintext and is incremented with each block.

Colin’s post states this as follows:

In CTR mode, you avoid passing attacker-provided data to the block cipher (with the possible exception of the nonce which forms part of each block). This reduces the attack surface even further: Using CTR mode, an attacker cannot execute a chosen-ciphertext (or chosen-plaintext) attack against your block cipher, even if (in the case of Encrypt-then-MAC) he can correctly forge the MAC (say, if he stole the MAC key but doesn’t have the Encrypt key). Is this an attack scenario worth considering? Absolutely — the side channel attacks published by Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests, and it appears unlikely that such attacks would be feasible in a context where an attacker could not provide a chosen input.

Josh’s paper shattered this assumption by showing how to attack several versions of counter mode with AES. He treats the unknown counter value as part of the key, solving for the combined key+counter using DPA. This gives intermediate values that have an error component. He first solves for varying portions of the inputs, leaving this unknown but constant error term. These error terms then cancel out between rounds, revealing the secret round keys.

His attack works because counter mode only changes a few bits each time the counter is incremented. Seeding an LFSR to use as the counter could make this attack more difficult. I would hesitate to say it makes it impossible since it seems like a more complex version of Josh’s attack could succeed against this. In conclusion, DPA countermeasures are still needed, even with counter mode.

More details surface on PS Jailbreak

There have been a few new developments regarding the recent PS3 USB exploit. Working with impressive speed, Mathieulh and other developers have released an open-source version of the exploit called PS Groove. A much more detailed analysis of PS Jailbreak was also posted, although it is still not completely clear how the exploit works.

The PS Groove exploit uses an AT90USB board with the excellent LUFA library as I had expected. (By the way, send some donations to Dean Camera if you use that library. He’s a generous developer.) It attaches the proper config descriptors in the right order but contains a different payload. It will also allow you to disconnect the USB device after the exploit is complete.

Now that more details are public, the exploit is quite impressive. It is a heap overflow, not stack overflow as Gamefreax had suggested. Also, I was right that they had misread the descriptor lengths (0x4D vs. 0xAD).

The exploit involves using various config/interface descriptors to align shellcode on the heap. Then through some still-unknown mechanism, a heap overflow gives a user-controllable function pointer, which is later called after free(). The bug appears to be related to how the PS3 enumerates Sony’s internal test JIG device. This device may be probed by a different portion of the kernel, which trusts the device’s USB descriptors more.

Go check out the code and read the analysis. I’d love to hear more about how this exploit works and how it was discovered.