Stop using unsafe keyed hashes, use HMAC

The HMAC construction turns a cryptographic hash algorithm into a keyed hash. It is commonly used for integrity protection when the sender and recipient share a secret key. It was developed to address various problems with arbitrary keyed hash constructions. So why are developers still rolling their own?

One of the original papers on keyed hash constructions describes the motivations for developing a standard for HMAC. In 1995, there was no standardization and cryptographers only worked from hunches as to what constituted a secure keyed hash. This paper summarized two known attacks on some common schemes that had evolved in the absence of a standard.

The first construction the paper attacks is H(k || m), aka “secret prefix”. The key and the message to be authenticated were concatenated and hashed. The authenticator was the resulting hash. This was fatally flawed, as I mentioned in my previous talks on web crypto. Standard hash algorithms that use the Merkle-Damgard construction (like SHA-1) are subject to a length-extension attack. An attacker can trivially create an authenticator for m’ where m’ = m1 || pad || m2 if they have seen the authenticator for m1. (The “pad” value makes the input a multiple of the compression function block size and includes the total length hashed). This flaw was most recently found in the Flickr API.

The second construction was H(m || k), aka “secret suffix”. While the length-extension attack no longer applies because k is unknown to the attacker, this still maximally exposes you to weaknesses in the hash algorithm. Preneel et al described two attacks on this approach.

The first attack is that secret suffix is weaker against offline second-preimage attacks. That is, an attacker can take an authenticator for a known plaintext m and calculate their own plaintext m’ that hashes to the same value as the block just before k. If the input to the hash function just before k is identical, then the output is also the same. This means the attacker can just send m’ and the previously seen authenticator for m and the two will match.

For a secure cryptographic hash function, a second-preimage attack takes 2n tries where n is the hash size in bits[1]. However, the secret suffix approach is marginally weaker to this kind of attack. If an attacker has seen t text and authenticator pairs, then the effort is only 2n / t since they can attempt a second-preimage match against any of the authenticators they have seen. This is usually not a problem since second-preimage attacks are usually much harder than finding collisions. As they have aged, all widely-used hash algorithms have fallen to collisions before second-preimage attacks.

The other attack is much more powerful. If the attacker can submit a chosen message to be authenticated, she can attempt an offline collision search. In this case, an attacker searches for two messages, m and m’, that hash to the same value. Once they are found, she requests an authenticator for the innocuous message m. Since a collision means the intermediate hash state before k is mixed in is identical (an “internal collision”), the final authenticator for both will be identical. The attacker then sends the evil message m’ with the authenticator for m, the two match, and the message is accepted as authentic.

This means the secret suffix construction is insecure if collisions can be found in the underlying hash function. Due to the birthday paradox, this takes 2n/2 work even for a secure hash function (e.g., 264 operations for a 128-bit hash). But it gets worse if the hash is weaker to collisions.

MD5 has multiple demonstrated collisions. Many systems continue to use HMAC-MD5 because a collision alone is not enough to compromise it. Because of the way the key is applied in HMAC, an attacker would have to generate an internal collision with the secret key, which is much harder than colliding with a chosen message[2]. Although this may provide some immediate comfort, it is still important to move to HMAC-SHA256 soon if you are using HMAC-MD5.

In contrast, MD5 with secret suffix is completely compromised due to collisions, especially with the recent advance in chosen-prefix collisions. Currently, this takes about 30 seconds on a laptop. To repeat, under no circumstances should you use an arbitrary hash construction instead of HMAC, and MD5 with secret suffix is completely broken. If you were putting off moving away from MD5(m || k), now would be an excellent time to move to HMAC-SHA256.

Thanks go to Trevor Perrin and Peter Gutmann for comments on this article.

[1] This is not true for longer messages. Multicollisions can be used against each block of a longer message. See the work by Kelsey and Schneier and Joux for more details.

[2] This is a very broad statement about HMAC. A more detailed analysis of its security will have to wait for another post.

3 thoughts on “Stop using unsafe keyed hashes, use HMAC

  1. I’m confused by the second method to defeat the hash(m || k) scheme, because most hashes have length padding that is added to the end. Could you help clear this up?

    If m and m’ have the same length, then the length padding for m || k and m’ || k will be the same and a collision with m and m’ will yeild a collision for m || k and m’ || k. However, if m and m’ have a different length then the length padding, AFAIK, will cause problems. Here is my reasoning:

    Since the attacker can choose m and m’, they can ensure that the hashes of m and m’, including padding, are equal. This external collision relies on both m and m’ and their padding. But k is concatenated before the padding which means that the new padding for both will be altered. If the padding for both messages is altered then the collision is no longer gaurenteed. (Unless adding a constant to the pre-image of a collision preserves the collision.)

    The paper explicitly requires an internal collision in order to make this attack work. But if the messages have different padding, then the hashes will be equal right up until the padding is added.

    The attacker can compensate by (assuming he knows the length of k) adjusting the padding he uses to reflect a length of length(m || k). He can then find a collision using the altered padding for m and m’. But still, adding the key to the end of the message, but before the padding, cause the internal state to be different before processing the padding, which should yield different hashes.

    I’ve read the paper by Preneel that you linked to, and it seems as if the problem I mention is because MD5 and SHA1 are not “pure” Merkle–Damgård constructions. The paper defines the “iterative” hash functions to be (from pg. 4):

    H_0 = IV; H_i = f(H_{i-1},m_i), 1 <= i <= t ; hash(m) = H_t

    and in the case of a MAC:

    hash(m) = g(H_t)

    However, for a function that includes length padding, this seems to be more like:

    hash(m) = g(H_t,l(m))

    because the final transform is a function of both the last block and the entire message (which is the length of the entire message).

    Am I missing something here? This second method of attack against hash(m || k) only seems to work for substituting m' of equal length to m when dealing with hashes like SHA or MD5.

    1. I think you’re confused because you’re trying to compare it to the length extension attack. This is a different attack. I said nothing about the length of m and m’ being different. For the case of auth tokens, the input message length is usually constant.

      The attack is to create an internal collision of m and m’ before k is appended. At that point, the chaining variables are identical so adding k and the padding will result in identical final hashes. You are right in that this attack does not allow different lengths for m and m’. However, this is not a serious limitation for the attack since you are choosing m, and thus can make it any length you want in order to match the eventual m’.

      1. OK, thanks. I just wanted to be certain that, in order for that attack to work, the collision had to be for messages of equal lengths.

Comments are closed.