DSA requirements for random k value

November 19, 2010November 19, 2010 ~ Nate Lawson ~ 16 Comments

Most public key systems fail catastrophically if you ignore any of their requirements. You can decrypt RSA messages if the padding is not random, for example. With DSA, many implementation mistakes expose the signer’s private key.

Many crypto protocols use a nonce. The traditional requirements of a nonce is that it never be repeated. So a simple counter can suffice, as long as it is safely persisted. Using a timestamp is problematic if the clock ever goes backwards (say, NTP) or you need to generate two messages in a short interval.

In DSA, the k value is not a nonce. In addition to being unique, the value must be unpredictable and secret. This makes it more like a random session key than a nonce. When an implementer gets this wrong, they expose the private key, often with only one or two signatures.

If k is predictable, there is a way to recover the private key from a single signature with straightforward algebra. Since a past weakness in the Debian PRNG resulted in only 32767 possible outputs, an attacker could recover any DSA private key where a single signature was generated on a vulnerable Debian system. The key could have been generated securely on a system without this flaw, but that single signature would compromise it.

But what about the requirement that k be unique? Assume there’s an email signing system that has a secure PRNG but it is rate-limited so two requests that come in quickly from the same process produce the same output. As an attacker, you can’t predict k but you suspect the signing system may have this weakness.

To generate a DSA signature, the signer calculates (r, s) as follows:

r = g^k mod p mod q
s = k^-1 (H(m) + x*r) mod q

If you see a repeated r value, then you know k has been repeated. Build a web crawler to find DSA signatures. Dump its output into a parser for a bunch of formats (X509v3, PGP, PKCS#7, etc.) From each signature file, insert each r value into a key-value database and look for a match. If you find any two matching r values, you can recover the signer’s private key.

The algebra to do so is pretty straightforward. Given two signatures for messages M_A and M_B, we want to solve for k.

S_A = k^-1 (H(M_A) + x*r) mod q
S_B = k^-1(H(M_B) + x*r) mod q

Subtract the two signatures. (The modular reduction step is implicit from here on for readability.)

S_A – S_B = k^-1 (H_A + x*r) – k^-1 (H_B + x*r)

Redistribute. Since the k’s are identical, their inverse is also.

S_A – S_B = k^-1 (H_A + x*r – H_B – x*r)

The x*r values cancel out.

S_A – S_B = k^-1 (H_A – H_B)

Redistribute.

k = (H_A – H_B) / (S_A – S_B)

Once you have k, you can solve for x directly as in the previous post. There is a more complicated technique for solving for x if only a few bits of k are known, requiring more signatures.

It is extremely important that all bits of k be unique, unpredictable, and secret. With two DSA signatures on separate messages with the same k, you can recover the signer’s private key.

Blackhat 2010 video on remote timing attacks

November 9, 2010November 9, 2010 ~ Nate Lawson ~ 2 Comments

The Blackhat staff recently posted the video and slides from our talk on remote timing attacks. You can watch the talk via the playlist below. While there wasn’t enough time to go into detail in every aspect, I think this talk gives a good explanation why developers should take these attacks seriously.

Our conclusions can be summarized as:

Surprisingly small timing differences are visible from remote (< 40 ns LAN, < 25 μs Internet in this talk). This is an improvement over previously published results.
Many factors thought to prevent timing attacks, including geographical distance and competing server load, do not have as big an effect as tradition suggests.
Many common crypto libraries have timing leaks and some are exploitable.
If you deliver software as a product, you can’t make assumptions about your customer’s threat model for side channel leaks. They may just be on Slicehost or EC2 or use a slow embedded processor.
Since some routines that leak timing information (such as “terminate early compare”) are easy to fix, it’s better to be conservative than hope your customer’s environment prevents exploitation.

Since giving this talk at Blackhat, we have an updated version with new results. I had hoped to post it now, but it turns out that perfect is still the enemy of “good enough”. We’ll have more on that soon. In the meantime, I hope this talk is helpful to you.

Next Baysec: November 10 at Irish Bank

October 12, 2010 ~ Nate Lawson

The next Baysec meeting is Wednesday, November 10, 7 – 11 pm at the Irish Bank. Come out and meet fellow security people from all over the Bay Area. As always, this is not a sponsored meeting, there is no agenda or speakers, and no RSVP is needed.

The last Baysec was a blast. Thanks to everyone who came out and for the great conversations. Remember that we usually skip December for the holidays, so don’t miss it.

10 Mark Lane
San Francisco, CA
415.788.7152
http://www.theirishbank.com/

Next Baysec: October 11 at Irish Bank

September 27, 2010 ~ Nate Lawson ~ 4 Comments

The next Baysec meeting is Monday, October 11, 7 – 11 pm at the Irish Bank. Come out and meet fellow security people from all over the Bay Area. As always, this is not a sponsored meeting, there is no agenda or speakers, and no RSVP is needed.

In trying to stay ahead of the game, also mark your calendars for Wednesday, November 10.

10 Mark Lane
San Francisco, CA
415.788.7152
http://www.theirishbank.com/

Building the ZoomFloppy slides

September 21, 2010September 28, 2010 ~ Nate Lawson ~ 6 Comments

At ECCC 2010, I presented these slides on the ZoomFloppy, a new device for accessing Commodore floppy drives from a PC via USB. The firmware, known as xum1541, has been available since fall 2009 for those who want to build their own board, but the ZoomFloppy is the first device that will be a complete product offered for sale. Jim Brain will be manufacturing and selling it by the end of the year.

The ZoomFloppy has a number of features beyond simple disk access, which is implemented in OpenCBM. It can also nibble protected disks using a parallel cable and nibtools. It is software-upgradeable and this presentation discusses some features that are planned for the future.

One surprising finding I made was that by running the 1571 drive in double-clocked (2 MHz mode), the hardware UART is just fast enough to enable transfer of raw bits, directly off the media. No one has every created a copier that took advantage of this “hidden” mode in the 25 years since the 1571 was introduced. Normally, this kind of transfer requires soldering a parallel cable into your drive. This mode works via the normal serial cable, but requires low-latency control of the bus that is only possible with a microcontroller (not DB25 printer port).

I also discuss how modern day piracy on the PS3 affected our chip supply and digress a bit to discuss old copy protection schemes. I hope you enjoy the presentation.

(Direct pdf download)

Another reason to MAC ciphertext, not plaintext

September 10, 2010 ~ Nate Lawson ~ 4 Comments

Colin Percival and I often disagree about how cryptography fits into software design. My opinion is that custom crypto designs always require intense third-party review, leaving companies with two undesirable possibilities. Either the review and maintenance cost of custom crypto increases the expense over competing design options that don’t require crypto (such as keeping critical state on the server) or leads to security flaws if third-party review is skipped. But we often agree on the concrete details, such as the preference for encrypt-then-MAC construction.

If you MAC the plaintext, the recipient must decrypt the message and then verify the MAC before rejecting it. An attacker can send garbage or carefully chosen ciphertexts and they can’t be rejected until they are decrypted. If the ciphertext is authenticated, then the server can reject the forged or modified message earlier.

This has a number of advantages, including some not emphasized in Colin’s post. He points out that this reduces the surface area of your code accessible to an attacker. This is especially true if the higher-level protocol is designed to minimize interpretation of unprotected header fields (and hopefully, not parsing anything unprotected). The earlier the MAC can be performed, the less code accessible to an attacker who is choosing messages.

Another advantage is that this helps prevent side channel attacks. If you decrypt and then validate the plaintext, an attacker can send chosen ciphertext and collect information from a side channel to recover the encryption key. However, if you reject such messages early, the decryption key is never used. The attacker is reduced to replaying valid messages seen previously, which limits (but doesn’t prevent) side channel attacks.

However, one point he gets wrong is the emphasis on counter mode (CTR) to prevent side channel attacks. A former colleague of mine, Josh Jaffe, published a paper on attacking counter mode at CHES 2007. The common lore before his paper was that counter mode limits side channel attacks because the attacker does not get to choose the plaintext input to the block cipher. Instead, a possibly-unknown counter value is fed in as plaintext and is incremented with each block.

Colin’s post states this as follows:

In CTR mode, you avoid passing attacker-provided data to the block cipher (with the possible exception of the nonce which forms part of each block). This reduces the attack surface even further: Using CTR mode, an attacker cannot execute a chosen-ciphertext (or chosen-plaintext) attack against your block cipher, even if (in the case of Encrypt-then-MAC) he can correctly forge the MAC (say, if he stole the MAC key but doesn’t have the Encrypt key). Is this an attack scenario worth considering? Absolutely — the side channel attacks published by Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests, and it appears unlikely that such attacks would be feasible in a context where an attacker could not provide a chosen input.

Josh’s paper shattered this assumption by showing how to attack several versions of counter mode with AES. He treats the unknown counter value as part of the key, solving for the combined key+counter using DPA. This gives intermediate values that have an error component. He first solves for varying portions of the inputs, leaving this unknown but constant error term. These error terms then cancel out between rounds, revealing the secret round keys.

His attack works because counter mode only changes a few bits each time the counter is incremented. Seeding an LFSR to use as the counter could make this attack more difficult. I would hesitate to say it makes it impossible since it seems like a more complex version of Josh’s attack could succeed against this. In conclusion, DPA countermeasures are still needed, even with counter mode.