Recovering a private key with only a fraction of the bits

Ever since my first post on breaking DSA, I’ve been meaning to write a clear description of how to recover a private key if you only have a fraction of the bits. For example, power analysis attacks may allow you to derive a few bits of the random k value on each measurement. It turns out you can combine multiple measurements to get a single k value and then recover the DSA private key. Of course, all this also applies to ECDSA.

Since I haven’t had time to put together a good summary article, here are some references for learning this on your own. The first paper in this area was Boneh and Venkatesan (1996). They described the basic Hidden Number Problem.

The next important paper was by Howgrave-Graham and Smart (1999) [1]. They used Babai’s algorithm[2] and LLL lattice reduction to solve for DSA private nonces. This was improved by Nguyen and Shparlinski (2000) [3] to solve for just the k values.

This attack applies any time the DSA nonce isn’t fully random and used only one time. It applies if a few of the bits are constant, if the RNG is biased towards certain values, or if you can recover part of the values by side channel attacks. These references should allow you to implement this attack yourself. It has been repeatedly used in private work, but I haven’t seen much public discussion about applying this to real-world systems.

[1] Howgrave-Grahm and Smart. “Lattice attacks on Digital Signature Schemes.” HP internal publication, 1999.
[2] Laszlo Babai. “On Lovász lattice reduction and the nearest lattice point problem.” Combinatorica 6. No online reference found.
[3] Nguyen and Shparlinski. “The Insecurity of the Elliptic Curve Digital Signature Algorithm with Partially Known Nonces.” Journal of Cryptology, volume 15, pp. 151-176, 2000.

Addendum: I found the following references to improve this list.

The Magic Inside Bunnie’s New NeTV

A year ago, what was probably the most important Pastebin posting ever was released by an anonymous hacker. The HDCP master key gave the ability for anyone to derive the keys protecting the link between DVD players and TVs. There was no possibility of revocation. The only remaining question was, “who would be the first to deploy this key in an HDCP stripper?”

Last week, the HDCP master key was silently deployed, but surprisingly, not in a stripper or other circumvention device. Instead, it’s enabling a useful new system called the Chumby NeTV. It was created by Bunnie Huang, who is known for inventing the Chumby and hacking the Xbox. He’s driving down the cost of TV-connected hardware with a very innovative approach.

The NeTV displays Internet apps on your TV. You can see Twitter feeds, view photos, and browse the web via an on-screen display. It overlays this information on your video source. You can control it from your iPhone or Android phone. It’s simple to install since you merely plug it inline with your cable box or DVD player’s HDMI connection to the TV. And in true Bunnie fashion, the hardware and software is all open source.

When I first heard of this last week, I didn’t think much of it. It’s a neat concept, but I don’t have an HDTV. Then, a friend contacted me.

“Have you figured out how the NeTV works? There’s a lot of speculation, but I think I’ve figured it out,” he said. I told him I hadn’t thought much about it, then downloaded the source code to the FPGA to take a look.

I was surprised to find an entire HDCP implementation, but it didn’t quite make sense. There was no decryption block or device keys. I emailed Bunnie, asking how it could do alpha blending without decrypting the video. He wrote back from a plane in Tokyo with a cryptic message, “No decryption involved, just chroma key.”

This was the hint I needed. I went back and watched the demo video. The overlay was not transparent as I had first thought. It was opaque. To do alpha blending, you have to have plaintext video in order to mask off the appropriate bits and combine them. But to apply an opaque overlay, you could just overwrite the appropriate video locations with your substituted data. It would require careful timing, but no decryption.

Chroma key (aka “blue/green screen”) uses color for in-band signaling. Typically, an actor performs in front of a green screen. A computer (or a filter, in the old days) substitutes data from another feed wherever there is green. This is the foundation of most special effects in movies. Most importantly, it is simple and can be performed quickly with a minimum of logic.

The NeTV generates its output signal by combining the input video source and the generated overlay with this same technique. The overlay is mostly filled with pixels of an unusual color (Bunnie called it “magic pink”). The FPGA monitors the input signal position (vertical/horizontal sync, which aren’t encrypted) to know where it is within each frame of video. When it is within the pink region of the overlay, it just passes through the encrypted input video. Otherwise, it displays the overlay. The HDCP implementation is needed to encrypt the overlay, otherwise this part of the screen will be scrambled when the TV tries to decrypt it. But, indeed, there is no decryption of the input content.

This is impressive work, on par with the demoscene. The NeTV synchronizes with every frame of video, no jitter, choosing which pixel stream to output (and possibly encrypt) on-the-fly. But there’s more.

To generate the keystream, the NeTV has to synchronize with the HDCP key exchange between video source and TV. It replicates each step of the process so that it derives the correct stream key. To keep any timing issues with the main CPU from delaying the key exchange, it resets the link after deriving the shared key to be sure everything is aligned again. Since the transport key only depends on the two endpoint device keys, the same shared key is always used.

This is extremely impressive from a technical standpoint, but it’s also interesting from a content protection standpoint. The NeTV has no device keys of its own; it derives the ones in use by your video source and TV as needed. It never decrypts video, only encrypts its on-screen display to match. It can’t easily be turned into an HDCP stripper since that would require a lot of rework of the internals. (The Revue, with its HDMI transceiver chip and Atom processor could probably be turned into an HDCP stripper with a similar level of effort.)

Bunnie has done it again with a cheap device that applies his extensive creativity to not just solve a problem, but do it in style. Whatever the outcome of his maverick engineering is in the marketplace, the internals are a thing of beauty.

Intermediate cryptography resources

People often ask me for a good introduction to intermediate cryptography. It’s often easy to find basic and dangerous introductions (“public key encryption is like a mailbox”), but the next level isn’t as available.

There’s no single source for this, but you can find good coverage of the main practical topics online. Here are some resources to get you started learning beyond cryptography basics.

Cryptography: an Introduction (Nigel Smart)

I can’t say enough good things about this book. It is a great way to learn about attacks on public key schemes (see part 4) and has good general coverage as well, including elliptic-curve.

Lecture Notes on Cryptography (Bellare and Goldwasser)

Good for understanding how to model block cipher constructions with PRFs and PRPs. When someone says “that construction is not IND-CPA-secure”, this will tell you what that means. Try chapters 5, 6, and 9. Also, see the class notes page for slides and individual chapters of this series.

Tom’s math and crypto libraries (Tom St. Denis)

It’s impossible to understand practical cryptography without looking at implementations. Tom’s libraries are relatively clear and readable and cover the gamut from low-level integer manipulation all the way up to protocols. There are no external dependencies and they are public domain. For extra credit, implement one of the ciphers yourself before looking at his code, then compare to see how you did.

He also includes a large PDF documenting the library, and it’s available as a book as well.

NIST FIPS, SP and RSA PKCS standards

The NIST standards are pretty clear. The RSA ones are a bit more difficult to read. In any case, it’s very helpful to read through these and ask “why?” for each requirement they make. There’s always a reason for every “shall” or “must”. But are there some “shoulds” that should be “shalls”?

Once you’ve moved beyond these resources, the best next level is to read survey papers (like Boneh’s coverage of RSA) in the specific area you’re interested in. If you have your own favorite resources for intermediate cryptography, let me know in the comments below.

Shatner on the future of microchips

AT&T recently published a lot of videos from their archives. I particularly like this video with William Shatner discussing the magic of the microchip. In hindsight, its view of the future is revealing.

First, it’s still a good introduction to how chips are produced. It shows an automated test machine, wire-bonding, and hand-soldering boards. It also shows microscopic views of a 5 micron circuit and a nice animation of clock pulses and gates. All of these processes are basically the same today, just more sophisticated.

On the flip side, its prediction about the computerization of telephones revolutionizing society has come and gone. Wired telephones connected to smart central computers have been surpassed by smart cellphones.

This video also reminded me how there once were more women in technology. Computer science was originally considered a branch of math, and women have often excelled at mathematics. When this video was made in 1980, women made up about 41% of computer science freshmen. That had dropped to about 12.5% by 2007. If you look at the graph from that article, it seems that women participated almost equally in the early 80’s computer craze but sat out the Dotcom boom.

I’m still not sure about all the causes of this, but it is a troubling trend. Any time half of your intelligence sits on the bench, you’re going to be at a disadvantage to other countries. Some studies have shown Chinese women have a more positive view of computers than other cultures. Additionally, all enrollment in computer science is lower as a percentage than at any time since the 1970’s.

What can be done to increase interest in computer science among all students and especially women?

Improving ASLR with internal randomization

Most security engineers are familiar with address randomization (ASLR). In the classic implementation, the runtime linker or image loader chooses a random base offset for the program, its dynamic libraries, heap, stack, and mmap() regions.

At a higher level, these can all be seen as obfuscation. The software protection field has led with many of these improvements because cracking programs is a superset of exploiting them. That is, an attacker with full access to a program’s entire runtime state is much more advantaged than one with only remote access to the process, filtered through an arbitrary protocol. Thus, I predict that exploit countermeasures will continue to recapitulate the historical progress of software protection.

The particular set of obfuscations used in ASLR were chosen for their ease of retrofitting existing programs. The runtime linker/loader is a convenient location for randomizing various memory offsets and its API is respected by most programs, with the most notable exceptions being malware and some software protection schemes. Other obfuscation mechanisms, like heap metadata checksumming, are hidden in the internals of system libraries. Standard libraries are a good, but less reliable location than the runtime linker. For example, many programs have their own internal allocator, reducing the obfuscation gains of adding protection to the system allocator.

A good implementation of ASLR can require attackers to use a memory disclosure vulnerability to discover or heap fung shui to create a known memory layout for reliable exploitation. While randomizing chunks returned from the standard library allocator can make it harder for attackers to create a known state, memory disclosure vulnerabilities will always allow a determined attacker to subvert obfuscation. I expect we’ll see more creativity in exercising partial memory disclosure vulnerabilities as the more flexible bugs are fixed.

ASLR has already forced researchers to package multiple bugs into a single exploit, and we should soon see attackers follow suit. However, once the base offsets of various libraries are known, the rest of the exploit can be applied unmodified. For example, a ROP exploit may need addresses of gadgets changed, but the relative offsets within libraries and the code gadgets available are consistent across systems.

The next logical step in obfuscation would be to randomize the internals of libraries and code generation. In other words, you re-link the internal functions and data offsets within libraries or programs so that code and data are at different locations in DLLs from different systems. At the same time, code generation can also be randomized so that different instruction sequences are used for the same operations. Since all this requires deep introspection, it will require a larger change in how software is delivered.

Fortunately, that change is on the horizon for other reasons. LLVM and Google NaCl are working on link-time optimization and runtime code generation, respectively. What this could mean for NaCl is that a single native executable in LLVM bitcode format would be delivered to the browser. Then, it would be translated to the appropriate native instruction set and executed.

Of course, we already have a form of this today with the various JIT environments (Java JVM, Adobe ActionScript, JavaScript V8, etc.) But these environments typically cover only a small portion of the attack surface and don’t affect the browser platform itself. Still, randomized JIT is likely to become more common this year.

One way to implement randomized code delivery is to add this to the installer. Each program could be delivered as LLVM IR and then native code generation and link addresses could be randomized as it was installed. This would not slow down the installation process significantly but would make each installation unique. Or, if the translation process was fast enough, this could be done on each program launch.

Assuming this was successfully deployed, it would push exploit development to be an online process. That is, an exploit would include a built-in ROP gadget generator and SMT solver to generate a process/system-specific exploit. Depending on the limitations of available memory disclosure vulnerabilities and specific process state, it might not be possible to automatically exploit a particular instance. Targeted attacks would have to be much more targeted and each system compromised would require the individual attention of a highly-skilled attacker.

I’m not certain software vendors will accept the nondeterminism of this approach. Obviously, it makes debugging production systems more difficult and installation-specific. However, logging the random seed used to initiate the obfuscation process could be used to recreate a consistent memory layout for testing.

For now, other obfuscation measures such as randomizing the allocator may provide more return on investment. As ROP-specific countermeasures are deployed, it will become easier to exploit a program’s specific internal logic (flags, offsets, etc.) than to try to get full code execution. It seems that, for now, exploit countermeasures will stay focused on randomizing and adding checksums to data structures, especially those in standard libraries.

But is this level of obfuscation where exploit countermeasures are headed? How long until randomized linking and code generation are part of a mainline OS?

State space explosion in program analysis and crypto

While analyzing some software the other day, I was struck by the duality of cryptanalyzing block ciphers and program analysis techniques. Both present a complex problem and similar tools can be applied to each.

The three main program analysis techniques are dynamic analysis (e.g., execution traces or debugging), symbolic execution, and abstract interpretation. Each has its place but also has unique disadvantages.

Dynamic analysis tests one set of paths through a program with some variance in inputs (and thus program state). Fuzzing is an attempt to increase the path coverage and number of states for each path via random inputs. Smart fuzzing directs the choice of these inputs by discovering constraints via an SMT solver. While dynamic analysis is fast and doesn’t give any false positives (a crash is a crash), it is extremely limited in coverage, both of code paths and program states.

Symbolic execution covers all possible inputs and code paths but has really poor performance. Since it models the exact behavior of the program for each state and code path, it does not lead to false positives or false negatives. The downside is that it is much too slow to handle more than a few simple functions.

Abstract interpretation has characteristics in common with both. It deploys three-valued logic (0, 1, and “unknown”) to predict a program’s behavior. While not fast, it is fast enough to be performed on the whole program (like dynamic analysis) and gives better coverage of inputs without the nondeterminism of fuzzing. Unlike symbolic execution, it is an under-approximation of behavior and thus leaves many questions unanswered. However, unlike fuzzing, you know exactly which states are indeterminate and can iterate on those areas.

One big problem with the two static techniques is state space explosion. Every time a conditional branch is encountered, the number of possible states doubles. Thinking cryptographically, this is analagous to adding one bit to a cipher’s key or a 1-bit S-box.

All modern block ciphers are based on the substitution and permutation primitives. Permutation is a linear operation and is easy to represent with a polynomial. Substitution (e.g., an S-box) is non-linear and increases the degree of the polynomial drastically, usually squaring it.

Algebraic cryptanalysis is a means of solving for a key by treating a cipher as a system of overdetermined equaations. What algorithms like XL do is convert a set of polynomials into linear equations, which are solvable by means such as Gaussian elimination. XL replaces each polynomial term with a single new variable, and then tries to reduce the equations in terms of the new variables. While it hasn’t broken AES yet, algebraic cryptanalysis will need to be accounted for as new ciphers are designed.

The duality between program analysis and cryptanalysis is interesting to me. Would it be useful to represent unknown conditional branches as bits of a key and the entire program as a cipher, then attempt to reduce with XLS? What about converting cipher operations on bits of an unknown key to conditional branches (or jump tables for bytewise operations) and reducing using abstract interpretation?

While this musing doesn’t have practical applications, it’s still fun to find parallels between distinct areas of your work.