SSL PKCS padding attack

January 7, 2008January 7, 2008 ~ Nate Lawson

The first notable attack on SSL 3.0 was Bleichenbacher’s PKCS#1 padding attack (1998). This gave the astonishing result that any possible side channel that gave information about the result of the RSA decryption could be used to iteratively decrypt bits of a previous session key exchange. This was recently applied to an attack on the version information added to the PKCS#1 padding in SSL 3.0.

The fix to these attacks is to substitute random data for the PremasterSecret and continue the handshake if the RSA decryption result fails any validity check. This will cause the server and client to calculate different MasterSecret values, which will be detected in the encrypted Finished message and result in an error. (If you need a refresher on the SSL/TLS protocol, see this presentation.)

Note that this approach still leaves a very slight timing channel since the failure path calls the PRNG. For example, the initial padding bytes in the result would be checked and a flag set if they are invalid. But that check involves a compare and conditional branch so technically there’s still a small leak. It’s hard to eliminate all side channels, so countermeasures to them need to be carefully evaluated.

The attack is quite clever and illustrates RSA’s malleability that I wrote about in this earlier series of articles (with Thomas Ptacek). It involves successive approximation, zeroing in on the value of the plaintext message M by sending variants of the ciphertext (C) to the server. The server decrypts each message and reports “padding incorrect” or continues processing. Remember, this oracle doesn’t have to explicitly report an error message, although that’s the most obvious way to distinguish the decryption result. It could just be a timing difference.

If the server does not report “padding incorrect”, then the attacker knows that the decryption result had the proper PKCS#1 padding (e.g., starts with the bytes ’00 02′ among other things) even though the remaining bytes are incorrect. If these bytes were uncorrelated to the other bytes in the message, this wouldn’t be useful to an attacker. For example, it’s easy to create a ciphertext that AES decrypts to a value with ’00 02′ for the first two bytes but this doesn’t tell you anything about the remaining bytes. However, RSA is based on an algebraic primitive (modular exponentiation) and thus does reveal a small amount of information about the remaining bytes, which are the message the attacker is trying to guess. In fact, an early finding about RSA is that an attacker who can predict the least-significant bit of the plaintext can decrypt the entire message.

The first step of the attack is to create a number of variants of the ciphertext, which is c = m^e mod n. These are all of the form c’ = cs^e mod n, where s is a random number. Once this is decrypted by raising it to d (the private exponent), the s value can be masked off by multiplying it by s^-1. This property is used in blinding to prevent timing attacks.

The second step is to send these values to the server. It decrypts each one and reports a “padding incorrect” error or success (but with a garbage result). For the s values that result in correct padding (’00 02′), the attacker knows the most-significant bit of the result is 0. In the paper, Bleichenbacher refers to this case as 2B <= ms mod n < 3B, which gives an interval of possible values for the message m. As more and smaller s values are found that are PKCS conforming, the number of possible intervals is reduced to one. That is, the union of all conforming ms_i values eliminates false intervals until only one is left, the one that contains the desired plaintext message.

The third step is to increase the size of the s_i values until they confine the message m to a single value. At that point, the original plaintext message has been found (after stripping off the given s_i value).

While this is a brilliant attack with far-reaching implications, it also illustrates the fragility of algebraic cryptography (i.e., public key) when subjected to seemingly insignificant implementation decisions. What engineer would think reporting an error condition was a bad thing?

SSL design principles talk

December 17, 2007March 21, 2011 ~ Nate Lawson ~ 6 Comments

I recently gave a talk to a networking class at Cal Poly on the design principles behind SSL. The talk was a bit basic because I had to assume the class had little security experience. My approach was to discuss how it works and stop at each phase of the protocol, asking what security flaws might be introduced by changing or removing elements. This worked well to get them thinking about why SSL has certain components that appear weird at first glance but make sense after closer inspection.

Others have told me that they use a similar technique when learning a new crypto algorithm by starting with the simplest primitive, identifying attacks, and then adding subsequent elements until the whole algorithm is present. If attacks still exist, the algorithm is flawed.

For example, consider DSA, one of the more complex signature schemes. Use the random value k directly (instead of calculating r = (g^k mod p) mod q) and the signature operation is simply:

s = k^-1(H(m)) + x mod q

This introduces a fatal flaw. k^-1 can be calculated from k via the extended Euclidean algorithm. The message is usually known, and thus H(m) is also. Thus, this would directly reveal the private key x to any recipient!

The references section at the end of the talk gives a good intro to the design principles behind SSL, especially the Wagner et al paper. My next articles will explain some SSL attacks in more detail.

TPM hardware attacks (part 2)

July 17, 2007July 15, 2007 ~ Nate Lawson ~ 13 Comments

Previously, I described a recent attack on TPMs that only requires a short piece of wire. Dartmouth researchers used it to reset the TPM and then insert known-good hashes in the TPM’s PCRs. The TPM version 1.2 spec has changes to address such simple hardware attacks.

It takes a bit of work to piece together the 1.2 changes since they aren’t all in one spec. The TPM 1.2 changes spec introduces the concept of “locality”, the LPC 1.1 spec describes new firmware messages, and other information available from Google show how it all fits together.

In the TPM 1.1 spec, the PCRs were reset when the TPM was reset, and software could write to them on a “first come, first served” basis. However, in the 1.2 spec, setting certain PCRs requires a new locality message. Locality 4 is only active in a special hardware mode. This special hardware mode corresponds in the PC architecture to the SENTER instruction.

Intel SMX (now “TXT”, formerly “LT”) adds a new instruction called SENTER. AMD has a similiar instruction called SKINIT. This instruction performs the following steps:

Load a module into RAM (usually stored in the BIOS)
Lock it into cache
Verify its signature
Hash the module into a PCR at locality 4
Enable certain new chipset registers
Begin executing it

This authenticated code (AC) module then hashes the OS boot loader into a PCR at locality 3, disables the special chipset registers, and continues the boot sequence. Each time the locality level is lowered, it can’t be raised again. This means the AC module can’t overwrite the locality 4 hash and the boot loader can’t overwrite the locality 3 hash.

Locality is implemented in hardware by the chipset using the new LPC firmware commands to encapsulate messages to the TPM. Version 1.1 chipsets will not send those commands. However, a man-in-the-middle device can be built with a simple microcontroller attached to the LPC bus. While more complex than a single wire, it’s well within range of modchip manufacturers.

This microcontroller would be attached to the clock, frame, and 4-bit address/data bus, 6 lines in total. While the LPC bus is idle, this device could drive the frame and A/D lines to insert a locality 4 “reset PCR” message. Malicious software could then load whatever value it wanted into the PCRs. No one has implemented this attack as far as I know, but it has been discussed numerous times.

What is the TCG going to do about this? Probably nothing. Hardware attacks are outside their scope, at least according to their documents.

“The commands that the trusted process sends to the TPM are the normal TPM commands with a modifier that indicates that the trusted process initiated the command… The assumption is that spoofing the modifier to the TPM requires more than just a simple hardware attack, but would require expertise and possibly special hardware.”

— Proof of Locality (section 16)

This shows why drawing an arbitrary attack profile and excluding anything that is outside it often fails. Too often, the list of excluded attacks does not realistically match the value of the protected data or underestimates the cost to attackers.

In the designers’ defense, any effort to add tamper-resistance to a PC is likely to fall short. There are too many interfaces, chips, manufacturers, and use cases involved. In a closed environment like a set-top box, security can be designed to match the only intended use for the hardware. With a PC, legacy support is very important and no single party owns the platform, despite the desires of some companies.

It will be interesting to see how TCPA companies respond to the inevitable modchips, if at all.

TPM hardware attacks

July 16, 2007July 15, 2007 ~ Nate Lawson ~ 4 Comments

Trusted Computing has been a controversial addition to PCs since it was first announced as Palladium in 2002. Recently, a group at Dartmouth implemented an attack first described by Bernhard Kauer earlier this year. The attack is very simple, using only a 3-inch piece of wire. As with the Sharpie DRM hack, people are wondering how a system designed by a major industry group over such a long period could be so easily bypassed.

The PC implementation of version 1.1 of the Trusted Computing architecture works as follows. The boot ROM and then BIOS are the first software to run on the CPU. The BIOS stores a hash of the boot loader in the TPM’s PCR before executing it. A TPM-aware boot loader hashes the kernel, appends that value to the PCR, and executes the kernel. This continues on down the chain until the kernel is hashing individual applications.

How does software know it can trust this data? In addition to reading the SHA-1 hash from the PCR, it can ask the TPM to sign the response plus a challenge value using an RSA private key. This allows the software to be certain it’s talking to the actual TPM and no man-in-the-middle is lying about the PCR values. If it doesn’t verify this signature, it’s vulnerable to this MITM attack.

As an aside, the boot loader attack announced by Kumar et al isn’t really an attack on the TPM. They apparently patched the boot loader (a la eEye’s BootRoot) and then leveraged that vantage point to patch the Vista kernel. They got around Vista’s signature check routines by patching them to lie and always say “everything’s ok.” This is the realm of standard software protection and is not relevant to discussion about the TPM.

How does the software know that another component didn’t just overwrite the PCRs with spoofed but valid hashes? PCRs are “extend-only,” meaning they only add new values to the hash chain, they don’t allow overwriting old values. So why couldn’t an attacker just reset the TPM and start over? It’s possible a software attack could cause such a reset if a particular TPM was buggy, but it’s easier to attack the hardware.

The TPM is attached to a very simple bus known as LPC (Low Pin Count). This is the same bus used for Xbox1 modchips. This bus has a 4-bit address/data bus, 33 MHz clock, frame, and reset lines. It’s designed to host low-speed peripherals like serial/parallel ports and keyboard/mouse.

The Dartmouth researchers simply grounded the LPC reset line with a short wire while the system was running. From the video, you can see that the fan control and other components on the bus were also reset along with the TPM but the system keeps running. At this point, the PCRs are clear, just like at boot. Now any software component could store known-good hashes in the TPM, subverting any auditing.

This particular attack was known before the 1.1 spec was released and was addressed in version 1.2 of the specifications. Why did it go unpatched for so long? Because it required non-trivial changes in the chipset and CPU that still aren’t fully deployed.

Next time, we’ll discuss a simple hardware attack that works against version 1.2 TPMs.

RSA public keys are not private (implementation)

May 3, 2007December 4, 2008 ~ Nate Lawson ~ 4 Comments

Previously, I described a proposed system that will both sign and encrypt code updates using an RSA private key. The goal is to derive the corresponding public key even though it is kept securely within the device.

If you delve into the details of RSA, the public key is the exponent and modulus pair (e, n) and private key is (d, n). The modulus is shared between the two keys and e is almost always 3, 17, or 65537. So in the vendor’s system, only d and n are actually secrets. If we could figure out d, we’ve broken RSA, so that’s likely a dead-end. But if we figure out e and n, we can decrypt the updates.

Let’s see how to efficiently figure out n using guesses of e and a pair of encrypted updates. The signature verification process, RSA-Verify, can be represented by the standard equation shown below in various forms.

The first step is the standard RSA-Pub-Op and the result is m, the padded hash of the message. The remaining steps in RSA-Verify (i.e., validating the signature) are not relevant to this attack. In the second equation, m is refactored to the left side. Finally, 0 mod n is described in an alternate form as k * n. The particular k value is the number of times the odometer has wrapped during exponentiation. It’s not important to us, just the modulus n.

Now, assume we’ve seen two updates and their signatures, sig1 and sig2. We can convert them to the above form by raising each signature to a guess for e and subtracting the associated message m. Note that m is still encrypted and we don’t know its contents.

This gives us two values with n as the common factor. The GCD algorithm gives us the common factors of two values very quickly. The reason there is no “=” sign above is that we actually get n times some small prime factors, not just n. Those can easily be removed by trial division by primes up to 1000 since the real factors of n (p and q) are at least 100-digit numbers. Once we have n, we can take each encrypted message and decrypt it with m^e mod n as usual, giving us the plaintext code updates.

I implemented this attack in python for small values of e. Try it out and let me know what you think.

Doing this attack with e = 65537 is difficult due to the fact that it would have to work with values millions of bits long. However, it is feasible and was implemented by Jaffe and Kocher in 1997 using an interesting FFT-multiply method. Perhaps they will publish it at some point. Thanks go to them for explaining this attack to me a few years ago.

Finally, I’d like to reiterate what I’ve said before. Public key cryptosystems and RSA in particular are extremely fragile. Do not use them differently than they were designed. Better yet, seek expert review or design assistance when working with any design involving crypto.

RSA public keys are not private

May 1, 2007December 16, 2008 ~ Nate Lawson ~ 3 Comments

RSA is an extremely malleable primitive and can hardly be called “encryption” in the traditional sense of block ciphers. (If you disagree, try to imagine how AES could be vulnerable to full plaintext exposure to anyone who can submit a message and get a return value of “ok” or “decryption incorrect” based only on the first couple bits.) When I am reviewing a system and see the signing operation described as “RSA-encrypt the signature with the private key,” I know I’ll be finding some kind of flaw in that area of the implementation.

The entire PKCS#1 standard exists to add structure to the payload of RSA to eliminate as much malleability as possible. To distinguish the RSA primitive operation from higher-level constructs that use the RSA primitive, I use the following terminology.

RSA-Pub/Priv-Op	Perform an RSA operation using the appropriate key (exponent/modulus): data^exp mod n
RSA-Encrypt	Add random padding and transform with RSA-Pubkey-Op
RSA-Decrypt	Transform with RSA-Privkey-Op and check padding in result
RSA-Sign	Add padding and transform with RSA-Privkey-Op
RSA-Verify	Transform with RSA-Pubkey-Op and check padding in result, hash data, compare with hash in payload and return True if match, else False

With that out of the way, a common need in embedded systems is secure code updates. A good approach is to RSA-Sign the code updates at the manufacturer. The fielded device then loads the update into RAM, performs RSA-Verify to be sure it is valid, checks version and other info, then applies the update to flash.

One vendor was signing their updates, but decided they also wanted privacy for the update contents. Normally, this would involve encrypting the update with AES-CBC before signing. However, they didn’t want to add another key to the secure processor since the design was nearly complete.

The vendor decided that since they already had an RSA public key in the device for signature verification, they would also encrypt the update to that key, using the corresponding private key. The “public” key was never revealed outside their secure processor so it was considered secret.

Let’s call the public/private key pair P1 and S1, respectively. Here’s their new update process:

On receiving the update, the secure processor would do the inverse, verifying the signature and then decrypting the code. (Astute reader: yes, there would be a chaining problem if encrypting more than one RSA block’s worth of data but let’s assume the code update is small.)

What’s wrong? Nothing, if you’re thinking about RSA in terms of conventional crypto. The vendor is encrypting something to a key that is not provided to an attacker. Shouldn’t this be as secure as a secret AES key? Next post, I’ll reveal how the malleability of the RSA primitive allows n to be easily calculated.