SSL gives point-to-point, not end-to-end security

At Summercon, Jon Oberheide gave a talk on Android security. He described a trojan app called RootStrap that checks for native code updates on a remote server and executes them. This is possible because Android doesn’t place any restrictions on native code that is installed with or later downloaded by an app. The only limitation is that the code runs as the app’s unprivileged UID, but there are no additional restrictions.

The more interesting part gives an overview how the app installation process interacts with GTalkService. In a followup post today, Jon gave more analysis of this installation mechanism. Unlike other parts of Android, this service is not open source so you have to disassemble the DEX file to see how it works.

When you select “install” on the Market app, the phone doesn’t download and install the app immediately. Instead, Google’s server sends an INSTALL_APP message to GTalkService on the phone, which downloads and installs the app. The message is delivered over SSL and includes a SHA-1 hash of the app’s installer (.apk). While this is better than no authentication, the link between the user’s action and the actual code installed is tenuous.

SSL provides good point-to-point privacy and integrity protection. However, there is no guarantee to upper layers that SSL is indeed in use. Few, if any, programs query the SSL layer to check the state of the connection, do additional cert validation, etc. Even when active, SSL provides point-to-point, not end-to-end security.

In today’s computing environment, there are seldom only two systems involved in a transaction. Even if the apps were stored on a single Google server, they are still compiled and signed on other systems. Anywhere along that production chain, a compromise could lead to apps being trojaned and surreptitiously pushed to many Android phones.

Android does provide some security in its code signing model. The developer’s signature on the .apk is basically a JAR signature. The hash of the APK cert is used to determine if a new app can access the same data as the previous app since it determines which UID an app gets. However, this only protects data created by existing apps from being accessed by other apps that are not signed with the same key. It also doesn’t say anything about the legitimacy of the code since the developer signs it themselves, often with a self-signed cert.

Since it appears that the INSTALL_APP message does not have any signature on itself, this message is not protected other than with SSL. Could an attacker who could inject some messages into the Google server replay this message, causing phones everywhere to install their malware? Will phones install apps without the Market service requesting it?

We’ll have to see what happens as more info is found out about GTalkService. The installation process should include a challenge/response value for liveness (perhaps this is the “tickle_id” field?) The installed APK should be linked to the phone’s install challenge with a Google signature. After all, Android ships with a list of CAs. Why can’t Google include some limited CA for their own domains to enable this signing?

This is a good example of how SSL only provides point-to-point, not end-to-end security. While SSL is great for transactions, additional protection is needed for application-level functions such as updates, especially in today’s multi-server environment.

A new direction for homebrew console hackers?

A recent article on game console hacking focused on the Wii and a group of enthusiasts who hack it in order to run Linux or homebrew games. The article is very interesting and delves into the debate about those who hack consoles for fun and others who only care about piracy. The fundamental question behind all this: is there a way to separate the efforts of those two groups, limiting one more than the other?

Michael Steil and Felix Domke, who were mentioned in the article, gave a great talk about Xbox 360 security a few years ago. Michael compared the history of Xbox 360 security to the PS3 and Wii, among other consoles. (Here’s a direct link to the relevant portion of the video). Of all the consoles, only the PS3 was not hacked at the time, although it has since been hacked. Since the PS3 had an officially supported method of booting Linux, there was less reason for the homebrew community to attack it. It was secure from piracy for about 3 years, the longest of any of the modern consoles.

Michael’s claim was that all of the consoles had been hacked to run homebrew games or Linux, but the ultimate result was piracy. This was likely due to the hobbyists having more skill than the pirates, something which has also been the case in smart phones but less so in satellite TV. The case of the PS3 also supports his theory.

Starting back in the 1980’s, there has been a history of software crackers getting jobs designing new protection methods. So what if the homebrew hackers put more effort into protecting their methods from the pirates? There are two approaches they might take: software or hardware protection.

Software protection has been used for exploits before. The original Xbox save game exploit used some interesting obfuscation techniques to limit it to only booting Linux. It stored its payload encrypted in the JPEG header of a penguin image. It didn’t bypass code signature verification completely, it modified the Xbox’s RSA public key to have a trivial factor, which allowed the author to sign his own images with a different private key.

With all this work, it took about 3 months for someone to reverse-engineer it. At that point, the same hole could be used to run pirated games. However, this hack didn’t directly enable piracy because there were already modchip-based methods in use. So, while obfuscation can add some time to pirates getting access to the exploit, it wasn’t much.

Another approach is to embed the exploit in a modchip. These have long been used by pirates to protect their exploits from other pirates. As soon as another group clones an exploit, the price invariably goes down. Depending on the exploitation method and protection skill of the designer, reverse-engineering the modchip can be as hard as developing the exploit independently.

The homebrew community does not release many modchips because of the development cost. But if they did, it’s possible they could reduce the risk of piracy from their exploits. It would be interesting to see a homebrew-only modchip, where games were signed by a key that certified they were independently developed and not just a copy of a commercial game. The modchip could even be a platform for limiting exploitation of new holes that were only used for piracy. In effect, the homebrew hackers would be setting up their own parallel system of control to enforce their own code of ethics.

Software and hardware protection could slow down pirates acquiring exploits. However, the approach that has already proven effective is to limit the attention of the homebrew hackers by giving them limited access to the hardware. Game console vendors should take into account the dynamics of homebrew hackers versus the pirates in order to protect their platform’s revenue.

But what can you also do about it, homebrew hackers? Can you design a survivable system for keeping your favorite console safe from piracy while enabling homebrew? Enforce a code of ethics within your group via technical measures? If anyone can make this happen, you can.

Attacking RSA exponentiation with fault injection

A new paper, “Fault-Based Attack of RSA Authentication” (pdf) by Pellegrini et al, is making the rounds. The general idea is that an attacker can disrupt an RSA private key operation to cause an invalid signature to be returned, then use that result to extract the private key. If you’re new to fault injection attacks on RSA, I previously wrote an intro that should help.

The main concept to grasp is that public key crypto is brittle. In the case of RSA’s CRT operation, a single bit error in one multiplication result is enough to fully compromise your private key. We’ve known this since 1997. The solution is simple: validate every signature with the public key before returning it to the caller.

The authors noticed something curious. OpenSSL does verify signatures it generates before returning them, but if it detects a problem, it does not just return an error. It then tries again using a different exponentiation process, and then returns that signature without validating it.

Think about this for a moment. What conditions could cause an RSA private key operation to compute an invalid answer? An innocent possibility is cosmic radiation, bad RAM, etc. In this case, all computations should be considered unreliable and any retried operation should be checked very carefully. The other and more likely possibility is that the system is under attack by someone with physical proximity. In this case, OpenSSL should generate a very obvious log message and the operation should not be retried. If it is, the result should be checked very carefully.

For whatever reason, the OpenSSL programmers decided to retry with fixed-window exponentiation and trust that since there were no published fault attacks for it, they didn’t have to validate its result. This is a foolhardy attitude — not something you want to see in your crypto library. There had been many other fault injection attacks against various components or implementation approaches for RSA, including right-to-left exponentiation. There was no reason to consider left-to-right exponentiation invulnerable to this kind of attack.

Fixed-window exponentiation is a form of sliding window exponentiation. This is just a table-based optimization, where a window (say, 3 bits wide) is moved across the exponent, computing the final result incrementally. While this may be resistant to some timing attacks (but not cache timing or branch prediction attacks), there is nothing that would prevent fault injection attacks.

Indeed, it turns out to be vulnerable. The authors generate a few thousand signatures with single bit-flips in some window of the signature. Then they compare the faulty signatures to a correct signature over the same message. They compute the value for that portion of the private exponent since there are only 2w possibilities for that location if w is the window size in bits. This is repeated until enough of the private key is known.

The method they used to create the faulty signatures was a bit artificial. They built a SPARC system on an FPGA running Linux and OpenSSL. They then decreased the power supply voltage until multiplies started to fail. Since multiplication logic is a relatively long chain, it is often one of the first things to fail. However, a more interesting hardware result would be to attempt this kind of attack on an actual server because FPGAs work differently than ASICs. It might require careful targeting of the right power pins on the CPU. Since power pins are numerous in modern systems, this may be more effective than only modulating the system power supply.

This was a nice attack but nothing earth-shattering. The only thing I was floored by (yet again), was the willingness of crypto implementers to perform unsafe operations in the face of an almost certain attack. Shame on OpenSSL.

Smart meter crypto flaw worse than thought

Travis Goodspeed has continued finding flaws in TI microcontrollers, branching out from the MSP430 to ZigBee radio chipsets. A few days ago, he posted a flaw in the random number generator. Why is this important? Because the MSP430 and ZigBee are found in many wireless sensor systems, including most Smart Meters.

Travis describes two flaws: the PRNG is a 16-bit LFSR and it is not seeded with very much entropy. However, the datasheet recommends this random number generator be used to create cryptographic keys. It’s extremely scary to find such a poor understanding of crypto in a device capable of forging billing records or turning off the power to your house.

The first flaw is that the PRNG is not cryptographically secure. The entropy pool is extremely small (16 bits), which can be attacked with a brute-force search in a fraction of a second, even if used with a secure PRNG such as Yarrow. Also, the PRNG is never re-seeded, which could have helped if implemented properly.

Even if the entropy pool was much larger, it would still be vulnerable because an LFSR is not a cryptographically-secure PRNG. An attacker who has seen some subset of the output can recreate the LFSR taps (even if they’re secret) and then generate any future sequence from it.

The second problem is that it is seeded from a random source that has very little entropy. Travis produced a frequency count graph for the range of values returned by the random source, ADCTSTL, a radio register. As you can see from that graph, a few 8-bit values are returned many times (clustered around 0 and 100) and some are not returned at all. This bias could be exploited even if it was used with a cryptographically-secure PRNG.

These problems are each enough to make the system trivially insecure to a simple brute-force attack, as Travis points out. However, it gets worse because the insecure PRNG is used with public-key crypto. The Z-Stack library includes ECC code written by Certicom. I have not reviewed that code, but it seems reasonable to use a library from a company that employs cryptographers. But the ECC code makes the critical mistake of leaving implementation of primitives such as the PRNG up to the developer. Other libraries (such as OpenSSL, Mozilla’s NSS, and Microsoft’s Crypto API) all have their own PRNG, even if seeding it has to be left up to the developer. That at least reduces the risk of PRNG flaws.

ECC, like other public key crypto, falls on its face when the design spec is violated. In particular, ECDSA keys are completely exposed if even a few bits of the random nonce are predictable. Even if the keys were securely generated in the factory during the manufacturing process, a predictable PRNG completely exposes them in the field. Since this kind of attack is based on poor entropy, it would still be possible even if TI replaced their PRNG with one that is cryptographically secure.

Given that these chips are used in critical infrastructure such as smart meters and this attack can be mounted from remote, it is important that it be fixed carefully. This will be difficult to fix since it will require hardware changes to the random source of entropy, and there is already an unknown number of devices in the field. Once again, crypto proves fragile and thorough review is vital.

Timing-independent array comparison

I find it interesting that people tend to follow the same stages of awareness when first hearing about timing attacks. A recent discussion about Django redesigning their cookie scheme shows people at the various stages.

1. The timing difference is too small to attack

This is the result of not understanding statistics or how jitter can be modeled and filtered. Usually, people are thinking that a single measurement for a given input is probably obscured by noise. This is correct. That’s why you take many measurements for the same input and average them. The more noise, the more measurements you need. But computers are fast, so it doesn’t take long to do thousands or even millions of runs per guessed byte.

The basic statistical method is simple. You apply a hypothesis test of the mean of all runs with a given guess for a byte (say, 0) against the mean for all the others (1, 2, …, 255) If there is a significant difference in the means, then you have guessed that byte correctly. If no difference, repeat for the other 255 values. If still no difference, take more measurements.

The other misconception is that jitter is too great to get anything useful out of the measurements, especially in a network. There is an excellent paper by Crosby and Wallach that debunks this myth by carefully analyzing noise and its causes as well as how to filter it. They conclude that an attacker can reliably detect processing differences as low as 200 nanoseconds on a LAN or 30 microseconds on a WAN given only 1000 measurements. If your server is hosted somewhere an attacker can also buy rackspace, then you are vulnerable to LAN timing attacks.

2. I’ll just add a random delay to my function

Then I’ll take more measurements. See above.

3. I’ll add a deadline to my function and delay returning until it is reached

This usually has a higher overhead and is more prone to errors than implementing your function such that its timing does not vary, based on the input data. If you make a mistake in your interval calculation or an attacker is able to “stun” your function somewhere in the middle such that all target computations occur after the interval timer has expired, then you’re vulnerable again. This is similar to avoiding buffer overflow attacks by lengthening the buffer — better to fix the actual problem instead.

4. Ok, ok, I developed a constant-time function

How carefully did you analyze it? After citing my Google talk on crypto flaws, here was one attempt from the Reddit thread:

def full_string_cmp(s1, s2):
    total = 0
    for a, b in zip(s1, s2):
        total += (a != b)
    return not total

The first bug is that this opens a worse hole if the two strings are not the same length. In fact, an attacker can send in a zero-length string and the result of zip() is an empty array. This results in the for() loop never executing and any input being accepted as valid! The fix is to check the two lengths first to make sure they’re equal or in C, compare two fixed-length arrays that are guaranteed to be the same size.

The next bug is smaller, but still valid. There’s a timing leak in the comparison of a and b. When you write:

total += (a != b)

Both Python and a C compiler actually generate low-level instructions equivalent to:

if (a != b)
    tmp = 1
else
    tmp = 0
total += tmp

Thus, there is still a small timing leak. If they are equal, an additional branch instruction is executed. If they are not equal, this is skipped. A common intuitive mistake among programmers is that single lines of code are atomic. This is why you might find a C version of this such as  “total += (a != b) ? 1 : 0”, which generates the same code. Just because it fits on a single line does not mean it is atomic or constant-time.

5. Ok, I fixed those bugs. Are we done yet?

Let’s see what we have now (adapted from my keyczar posting).

if len(userMsg) != len(correctValue):
    return False
result = 0
for x, y in zip(userMsg, correctValue):
    result |= ord(x) ^ ord(y)
return result == 0

This now uses an arithmetic operator (XOR) instead of a logical compare (!=), and thus should be constant-time. The += operator could possibly have a leak if carries take longer than an add without carry, so we went with |= instead. We check the lengths and abort if they don’t match. This does leak timing information about the length of the correct value, so we have to make sure that is not a problem. Usually it’s not, but if these were passwords instead of cryptographic values, it would be better to hash them with PBKDF2 or bcrypt instead of working with them directly. But are we done?

Maybe. But first we need to figure out if our high-level language has any behavior that affects the timing of this function. For example, what if there is sign-bit handling code in the Python interpreter that behaves differently if x or y is negative? What if the zip() operator has an optimization that compares two arrays first to see if they’re identical before returning their union?

The answer is you can never be sure. In C, you have more assurance but still not enough. For example, what if the compiler optimization settings change? Is this still safe? What about the microarchitecture of the CPU? What if Intel came out with a new byte-wise cache alignment scheme in the Pentium 1000?

I hope this has convinced some people that side channel attacks are not easily solved. Important code should be carefully reviewed to have higher assurance this class of attack has been mitigated.

Side channel attacks on cryptographic software

Below is a recent article I wrote for IEEE Security and Privacy magazine titled “Side Channel Attacks on Cryptographic Software” (pdf). It covers simple timing attacks against HMAC validation, AES cache timing, and RSA branch prediction attacks. It’s a survey article, covering excellent research in side channel attacks over the past few years.

I think the attacker position is gaining an advantage recently. As CPU microarchitecture gets more complicated, more covert channels appear. Also, the move to virtualization and high-resolution timers gives better quality measurements and more opportunities to exploit even the tiniest of leaks. We’ll need to come up with more clever ways of modeling and eliminating covert channels, moving crypto operations into hardware, and giving software more control over microarchitectural state.

Let me know what you think.