Timing attack in Google Keyczar library

May 28, 2009May 29, 2009 ~ Nate Lawson ~ 22 Comments

I recently found a security flaw in the Google Keyczar crypto library. The impact was that an attacker could forge signatures for data that was “signed” with the SHA-1 HMAC algorithm (the default algorithm).

Firstly, I’m really glad to see more high-level libraries being developed so that programmers don’t have to work directly with algorithms. Keyczar is definitely a step in the right direction. Thanks to all the people who developed it. Also, thanks to Stephen Weis for responding quickly to address this issue after I notified him (Python fix and Java fix).

The problem was that the HMAC verify function (Python src/keyczar/keys.py, Java src/org/keyczar/HmacKey.java) leaked timing information based on how long a verify operation took to fail. The function was defined as follows for the HMAC mode:

Python

    return self.Sign(msg) == sig_bytes

Java

    return Arrays.equals(hmac.doFinal(), sigBytes);

Since the return value is a SHA-1 hash string, the operation devolves to a byte-by-byte compare against sig_bytes. In both Python and Java, this is a classic sequence comparison that terminates early once an incorrect match is found. This allows an attacker to iteratively try various HMAC values and see how long it takes the server to respond. The longer it takes, the more characters he has correct.

It may be non-intuitive, but the symmetric nature of MACs means the correct MAC value for an arbitrary message is a secret on-par with key material. If the attacker knows the correct MAC for a message of his choosing, he can then send that value to forge authentication of the message to the server.

I’ve implemented a simple test server using the Python version of Keyczar. It verifies an HMAC and sends back “yes” or “no” if the value is correct. I then wrote a client in C that connects to the server and tries various values for the HMAC. It tries each value multiple times and records a set of TSC differences for each. These can be fed to a program like ministat to decide when a significant difference has been confirmed (based on mean and standard deviation).

I can confirm that localhost tests have a discernible difference, depending on whether each subsequent byte is correct. I have not optimized the attack to work over a LAN or the Internet yet. However, this does not mean remote attacks are infeasible. Where jitter and other noise is present in the samples, an attacker just needs to collect more data to average it out. Remote timing attacks on SSL have been demonstrated where the timing difference was only a few native multiplies.

I recommended changing the verify function to use a timing-independent compare, such as the following.

    correctMac = self.Sign(msg)
    if len(correctMac) != len(sig_bytes):
        return False
    result = 0
    for x, y in zip(correctMac, sig_bytes):
        result |= ord(x) ^ ord(y)
    return result == 0

This function is data-independent, except for revealing the total length of the correctMac string. Since this is not considered important to security, it is acceptable. Of course, this might not be true for another use of this same code, so it cannot be blindly used in other applications.

The lesson from this is that crypto flaws can be very subtle, especially when it comes to transitioning from an abstract concept (“compare”) to a concrete implementation (“loop while bytes are equal”). Keyczar was implemented by some smart people. If you’re a programmer, you should be using a high-level library like Keyczar or GPGME to take advantage of this knowledge. If you ignore this and develop your own design, it’s likely it would have many worse problems than this one. For those that have to build crypto, please get a third-party review of your design.

I consider it a failing of the crypto community that these libraries are still so new, while the past 20 years we’ve focused on providing raw algorithm APIs. But at least now we have a chance to build out a few important high-level libraries, review them carefully, and encourage application developers to use them. It’s not too late.

The Debian PGP disaster that almost was

May 17, 2009May 17, 2009 ~ Nate Lawson ~ 3 Comments

A year ago, I wrote about the Debian OpenSSL PRNG bug that reduced the entropy of its random seed to 15 bits. There was a little-noticed part of the advisory that said all DSA keys used on the affected systems should be considered compromised. In the rush to find and replace SSL certs and SSH keys generated on Debian or Ubuntu systems, very few people grasped the significance of this other warning. This is important because an attacker can retroactively seek out DSA signatures generated during the vulnerable period and use them to recover your private key.

DSA is a public-key signature algorithm. Unlike RSA, it isn’t useful for encryption or key exchange. Like other public key algorithms, it is extremely sensitive to the choice of parameters. I’ve written about RSA signature flaws (1, 2, 3) that resulted from too much ambiguity in how a signature verify operation was interpreted.

With DSA, the entropy of the random signature value k is critical. It is so critical that knowledge of only a few bits of k can reveal your entire private key to an attacker. Interestingly enough, the Wikipedia article on DSA doesn’t mention this concern. This is why it’s so important to get your crypto reviewed by an expert. Small, obscure flaws can cause immense damage.

To generate a DSA signature, the signer calculates (r, s) as follows:

r = g^k mod p mod q
s = k^-1(H(m) + x*r) mod q

The message to be signed is m, H(m) is the SHA hash function, and p and q are primes. The value k is a random nonce and x is the signer’s private key. If an attacker knows k and has a single signature (r, s), he can recover the signer’s private key with a simple calculation. In the case of the vulnerable PRNG, he can just repeat this process for all 32,767 possible values. Remember that the message m is not secret, so neither is the SHA-1 hash H(m). The attacker calculates x as follows:

x = ((s * k) – H(m)) * r^-1 mod q

The impact of this attack is that every signature generated on a vulnerable system reveals the signer’s private key. An attacker can find old signatures by crawling your website, examining signed email, analyzing saved packet captures of an SSL exchange, etc. The associated DSA key has to be revoked, regenerated and redistributed. Luckily for Debian, their packages are signed using GnuPG, which did not use the OpenSSL PRNG. But for anyone using other software based on OpenSSL, you need to revoke all DSA keys used to sign data on vulnerable Debian or Ubuntu systems. Even if the key was generated securely, a single insecure signature reveals the entire private key. It’s that bad.

I hope a year has been enough time for people to revoke their DSA keys, even though the warning was somewhat obscure. Thanks to Peter Pearson for interesting discussions about this issue.

Note to WordPress on SSL

March 10, 2009March 15, 2009 ~ Nate Lawson ~ 5 Comments

Dear WordPress/Automattic:

Your servers do not offer SSL session resumption. This means that every response contains a server certificate (3807 bytes) and your server has to perform a 2048-bit RSA decryption. This occurs for every piece of data fetched over SSL, even the tiny button pictures that are smaller than the certificate itself.

You should really enable SSL session resumption. It will save a lot of money in server cost and bandwidth, and your users will be happier too.

Thanks,
Nate

[Edit: WordPress staff replied that this was a mistake in their configuration and now this is fixed.]

SSL is not table salt

March 9, 2009 ~ Nate Lawson ~ 4 Comments

While I haven’t written an article in a while, I’m still alive. I just got buried with work, tax prep, and using every spare moment to try to finish up the xum1541. Last week, I attended the iSec Forum and saw a talk about cookie forcing based on work by Chris Evans and Michael Zalewski. This attack involves overwriting SSL-only cookies with a cookie injected into a non-SSL connection. In other words, browsers prevent disclosure of SSL-only cookies, but not deletion or replacement by cookies from an insecure session.

I don’t follow network security closely so this may be an older attack. However, it reminds me how the web application and browser designers treat SSL like table salt — sprinkle a little bit here and there, but be careful not to overuse it. That’s completely the wrong mentality.

WordPress recently notified their users how to enable SSL for the admin interface. While it’s admirable that they are providing more security, the attitude behind the post is a great example of this dangerous mentality. They claim SSL is only recommended when blogging from a public network, even going so far as to suggest it be disabled again when back on a “secure network”. It’s hard to believe performance is the issue, given the CPU gains in the past 13 years.

Attention: if you’re using a web application on a shared network (say, the Internet), you’re not on a secure network. This whole idea that users should pick and choose SSL based on some ephemeral security assessment of the local network is insane. How can you expect anyone, let alone regular users, to perform a security assessment before disabling SSL and then remember to re-enable it before traveling to an insecure network? (You can’t log into your blog and re-enable SSL from the insecure network because you would get compromised doing so.)

Likewise, sites such as Yahoo Mail use SSL for submitting the login password, but then provide a session cookie over plain HTTP. A session cookie is almost as good as a password. As long as the attacker refreshes their own session periodically, the cookie stays valid. (Do any web services implement an absolute session limit?) Even if the user clicks “log out”, the attacker can feed a fake logout page to them and keep the cookie active.

All cookies should definitely have their own cryptographic integrity protection and encryption, independent of SSL. But it is clear that the entire attitude toward SSL is wrong, and we will all eventually have to change it. Open wireless networks have helped session hijacking proliferate, no ARP spoofing needed. Soon, malware may contain a MITM kit to compromise any user accessing a website who shares an access point with a rooted system. As this attack becomes more common, perhaps we’ll see the end of SSL as an accessory, and it will be mandated for the entirety of every authenticated session. The prevailing attitude will have to change first.

Comcast to give up BitTorrent blocking?

March 31, 2008March 30, 2008 ~ Nate Lawson

While discussion about how to implement traffic shaping was occurring, Comcast and BitTorrent.com made a joint announcement last week. Comcast will apply bulk rate-limiting to customer connections and not target BitTorrent uploads. And BitTorrent (the company) will cooperate with Comcast to make the protocol more amenable to ISP control in some way to be determined.

My claim was that this was all about centralized control of applications, not about congestion management. After Comcast began to get heat from the FCC for not announcing their ad-hoc filtering, they decided it would be best to exit this experiment for now. Comcast’s move seems to confirm my analysis.

Note what was absent in this announcement. No complaints that aggregate rate-limiting is too hard or expensive to deploy. No argument that only they knew what was good for their network and that seeding needed to be stopped at all costs. They simply announced they would move to rate-limiting, which was a simpler and more appropriate response all along.

The strangest part of this whole affair was the MPAA announcement praising BitTorrent for cooperating with Comcast. However, if you accept the theory that this is all about control, it makes more sense. The *AA organizations have been stymied by protocols that don’t have a single responsible entity that can be petitioned or shut down.

They may assume this BitTorrent announcement is the start of a new era where ISPs can be used as a lever to bring application and protocol authors back to the negotiating table. Under cover of “network management”, the ISPs will provide the necessary pressure to get changes made to allow more centralized control. Only time will tell if this is just a historical footnote or the start of something new.

How RED can solve congestion problems

March 27, 2008 ~ Nate Lawson ~ 10 Comments

After my post about enforcing fair bandwidth allocation in over-subscribed networks like consumer broadband, George Ou and I had an email exchange.

If you recall, George was advocating a swapout of all client TCP stacks or a static penalty for using an old stack. The root issue he was trying to solve was making sure bandwidth allocation was fair between users when they shared a congested gateway. His concern was that opening multiple TCP connections allowed one user to gain an unfair share of the total bandwidth.

When I proposed RED (“Random Early Detection”) as a solution, George disagreed and said it doesn’t solve anything. I asked why and he responded:

“RED merely starts dropping packets randomly and before you have full congestion on the router link. Random packet dropping statistically slows all flow rates down the same so it makes flow rates ‘fair’. What RED does not account for is per user fairness and how many flows each user opens up. RED does not neutralize the multi-stream advantage/cheat.”

This seemed like a good opportunity to explain how RED actually works. It’s a neat approach that requires little intelligence and keeps no state. For a brief overview, try its Wikipedia article or the original paper by Sally Floyd and Van Jacobson. RED takes advantage of randomness to introduce fairness when managing congestion.

When a gateway is uncongested, all packets are forwarded. In the case of consumer broadband, this means that subscribers can “burst” to higher speeds when others aren’t using the network. Once the gateway reaches some high water mark, RED is enabled with a varying probability of drop based on how much congestion is encountered.

For each packet, a gateway using RED rolls the dice. For example, if there’s a lot of congestion the probability might be 50%, equivalent to flipping a coin (“heads: forward the packet, tails: drop it”). The gateway knows nothing about the packet including who sent it, the type of application being used, etc. Each packet is considered in isolation and nothing is remembered about it afterward.

After thinking about this a bit, you can see how this enforces fairness even if one user has opened multiple TCP connections to try to gain a greater allocation of the bandwidth. Let’s say the current drop rate is 33% and there are two users, one that has opened two connections and the other who has one. Before the gateway enables RED, each connection is taking up 33% of the bandwidth but the user with two connections is getting 66% of the total, twice as much.

Once RED is activated, each packet coming in has a 33% chance of being dropped, independent of who sent it. For the user with two connections who is using twice the bandwidth, there is twice the probability that one of their two connections will be the victim. When their connection is hit, ordinary TCP congestion control will back off to half the bandwidth it was using before. In the long run, occasionally the user with one connection will have a packet dropped but this will happen twice as often to the heavy user. If the heavy user drops back to one connection, that connection will expand its sending rate until it is using about half the total available bandwidth but they will actually get better throughput.

This is because TCP backs off to half its bandwidth estimator when congestion is encountered and then linearly increases it from there (the good old AIMD method). If a user has one connection and packets are being randomly dropped, they are more likely to spend more time transmitting at the correctly-estimated rate than if they have multiple connections. It’s actually worse for total throughput to have two connections in the back-off state than one. This is because a connection that is re-probing the available bandwidth spends a lot of time using less than its fair share. This is an incentive for applications to go back to opening one connection per transfer.

Getting back to consumer broadband, Comcast and friends could enable what’s called Weighted RED on the shared gateways. The simplest approach is to enable normal RED with a fixed drop ratio of 1/N where N is the number of subscribers sharing that gateway. For example, ten subscribers sharing a gateway would give a drop ratio of 10%. This enforces per-user fairness, no matter how many parallel TCP connections or even computers each user has. With Weighted RED, subscribers that have paid for more bandwidth can be sorted into queues that have lower drop ratios. This enforces fairness across the network while allowing differentiated subscriptions.

As I said before, all this has been known for a long time in the networking community. If Comcast and others wanted to deal with bandwidth-hogging users and applications while still allowing “bursting”, this is an extremely simple and well-tested way to do so that is available on any decent router. The fact that they go for more complicated approaches just reveals that they are looking for per-application control, and nothing less. As long as they have that goal, simple approaches like RED will be ignored.