The HMAC construction turns a cryptographic hash algorithm into a keyed hash. It is commonly used for integrity protection when the sender and recipient share a secret key. It was developed to address various problems with arbitrary keyed hash constructions. So why are developers still rolling their own?
One of the original papers on keyed hash constructions describes the motivations for developing a standard for HMAC. In 1995, there was no standardization and cryptographers only worked from hunches as to what constituted a secure keyed hash. This paper summarized two known attacks on some common schemes that had evolved in the absence of a standard.
The first construction the paper attacks is H(k || m), aka “secret prefix”. The key and the message to be authenticated were concatenated and hashed. The authenticator was the resulting hash. This was fatally flawed, as I mentioned in my previous talks on web crypto. Standard hash algorithms that use the Merkle-Damgard construction (like SHA-1) are subject to a length-extension attack. An attacker can trivially create an authenticator for m’ where m’ = m1 || pad || m2 if they have seen the authenticator for m1. (The “pad” value makes the input a multiple of the compression function block size and includes the total length hashed). This flaw was most recently found in the Flickr API.
The second construction was H(m || k), aka “secret suffix”. While the length-extension attack no longer applies because k is unknown to the attacker, this still maximally exposes you to weaknesses in the hash algorithm. Preneel et al described two attacks on this approach.
The first attack is that secret suffix is weaker against offline second-preimage attacks. That is, an attacker can take an authenticator for a known plaintext m and calculate their own plaintext m’ that hashes to the same value as the block just before k. If the input to the hash function just before k is identical, then the output is also the same. This means the attacker can just send m’ and the previously seen authenticator for m and the two will match.
For a secure cryptographic hash function, a second-preimage attack takes 2n tries where n is the hash size in bits[1]. However, the secret suffix approach is marginally weaker to this kind of attack. If an attacker has seen t text and authenticator pairs, then the effort is only 2n / t since they can attempt a second-preimage match against any of the authenticators they have seen. This is usually not a problem since second-preimage attacks are usually much harder than finding collisions. As they have aged, all widely-used hash algorithms have fallen to collisions before second-preimage attacks.
The other attack is much more powerful. If the attacker can submit a chosen message to be authenticated, she can attempt an offline collision search. In this case, an attacker searches for two messages, m and m’, that hash to the same value. Once they are found, she requests an authenticator for the innocuous message m. Since a collision means the intermediate hash state before k is mixed in is identical (an “internal collision”), the final authenticator for both will be identical. The attacker then sends the evil message m’ with the authenticator for m, the two match, and the message is accepted as authentic.
This means the secret suffix construction is insecure if collisions can be found in the underlying hash function. Due to the birthday paradox, this takes 2n/2 work even for a secure hash function (e.g., 264 operations for a 128-bit hash). But it gets worse if the hash is weaker to collisions.
MD5 has multiple demonstrated collisions. Many systems continue to use HMAC-MD5 because a collision alone is not enough to compromise it. Because of the way the key is applied in HMAC, an attacker would have to generate an internal collision with the secret key, which is much harder than colliding with a chosen message[2]. Although this may provide some immediate comfort, it is still important to move to HMAC-SHA256 soon if you are using HMAC-MD5.
In contrast, MD5 with secret suffix is completely compromised due to collisions, especially with the recent advance in chosen-prefix collisions. Currently, this takes about 30 seconds on a laptop. To repeat, under no circumstances should you use an arbitrary hash construction instead of HMAC, and MD5 with secret suffix is completely broken. If you were putting off moving away from MD5(m || k), now would be an excellent time to move to HMAC-SHA256.
Thanks go to Trevor Perrin and Peter Gutmann for comments on this article.
[1] This is not true for longer messages. Multicollisions can be used against each block of a longer message. See the work by Kelsey and Schneier and Joux for more details.
[2] This is a very broad statement about HMAC. A more detailed analysis of its security will have to wait for another post.