Colin Percival and I often disagree about how cryptography fits into software design. My opinion is that custom crypto designs always require intense third-party review, leaving companies with two undesirable possibilities. Either the review and maintenance cost of custom crypto increases the expense over competing design options that don’t require crypto (such as keeping critical state on the server) or leads to security flaws if third-party review is skipped. But we often agree on the concrete details, such as the preference for encrypt-then-MAC construction.
If you MAC the plaintext, the recipient must decrypt the message and then verify the MAC before rejecting it. An attacker can send garbage or carefully chosen ciphertexts and they can’t be rejected until they are decrypted. If the ciphertext is authenticated, then the server can reject the forged or modified message earlier.
This has a number of advantages, including some not emphasized in Colin’s post. He points out that this reduces the surface area of your code accessible to an attacker. This is especially true if the higher-level protocol is designed to minimize interpretation of unprotected header fields (and hopefully, not parsing anything unprotected). The earlier the MAC can be performed, the less code accessible to an attacker who is choosing messages.
Another advantage is that this helps prevent side channel attacks. If you decrypt and then validate the plaintext, an attacker can send chosen ciphertext and collect information from a side channel to recover the encryption key. However, if you reject such messages early, the decryption key is never used. The attacker is reduced to replaying valid messages seen previously, which limits (but doesn’t prevent) side channel attacks.
However, one point he gets wrong is the emphasis on counter mode (CTR) to prevent side channel attacks. A former colleague of mine, Josh Jaffe, published a paper on attacking counter mode at CHES 2007. The common lore before his paper was that counter mode limits side channel attacks because the attacker does not get to choose the plaintext input to the block cipher. Instead, a possibly-unknown counter value is fed in as plaintext and is incremented with each block.
Colin’s post states this as follows:
In CTR mode, you avoid passing attacker-provided data to the block cipher (with the possible exception of the nonce which forms part of each block). This reduces the attack surface even further: Using CTR mode, an attacker cannot execute a chosen-ciphertext (or chosen-plaintext) attack against your block cipher, even if (in the case of Encrypt-then-MAC) he can correctly forge the MAC (say, if he stole the MAC key but doesn’t have the Encrypt key). Is this an attack scenario worth considering? Absolutely — the side channel attacks published by Bernstein (exploiting a combination of cache and microarchitectural characteristics) and Osvik, Shamir, and Tromer (exploiting cache collisions) rely on gaining statistical data based on a large number of random tests, and it appears unlikely that such attacks would be feasible in a context where an attacker could not provide a chosen input.
Josh’s paper shattered this assumption by showing how to attack several versions of counter mode with AES. He treats the unknown counter value as part of the key, solving for the combined key+counter using DPA. This gives intermediate values that have an error component. He first solves for varying portions of the inputs, leaving this unknown but constant error term. These error terms then cancel out between rounds, revealing the secret round keys.
His attack works because counter mode only changes a few bits each time the counter is incremented. Seeding an LFSR to use as the counter could make this attack more difficult. I would hesitate to say it makes it impossible since it seems like a more complex version of Josh’s attack could succeed against this. In conclusion, DPA countermeasures are still needed, even with counter mode.
4 thoughts on “Another reason to MAC ciphertext, not plaintext”
Whilst MAC’ing the ciphertext does add an additional layer, it only really gives a 2x complexity. Basically, the attacker can still apply known and chosen input plaintext which will be processed by the internal MAC key (so that they can be rejected). The attacker can then use side channel analysis on this key, and once obtained, use it to create valid MACs on their chosen input to allow for DPA of the encryption key.
However, I agree that you should apply authentication to your message packets after encryption, rather than before – but mainly for the reason of reducing accessible code which could be attacked by buffer overflow and such like, rather than protection against side channel analysis.
Yes, I agree you can attack each layer serially. However, “2x complexity” sounds easier than it actually is.
If you’re using AES + HMAC-SHA256, you have to implement two separate attacks since there are different cryptographic functions in use. Also, SHA tends to leak less than AES (depending on implementation, DPA countermeasures, etc.) so it may not even be feasible to attack such a target even if the AES implementation does leak.
It would be interesting to see a paper on DPA against a production system using only replayed messages from a valid source. Depending on the format and variability of the messages, this could be possible. A MAC wouldn’t prevent replay.
Certainly, HMAC makes things harder. I would not necessarily say that SHA leaks less than AES/DES/RSA, etc, but its more that finding predictable ‘intermediate ciphertext’ differences that are within a useful (ie computationally exploitable) bound is much harder. However, the XOR operations on the key which are a fundamental part of a HMAC operation are always an interesting target, and will probably at least provide information on the Hamming weight of the key, if not the full key itself.
Attacking using only replayed messages is of course possible. In fact, it may even be quite easy – it would really come down to how much the underlying crypto implementation leaks (therefore how many sample waveforms are required for successful analysis), and how the implementation works to allow you to have sufficient number of different inputs to gain the requisite waveform captures.
I did start to say/type that this would be harder for low volume traffic such as software updates, but upon reflection this is not true. Although low volume, you have a larger payload, so it would be easy to capture 100s of thousands of operations based on the various different blocks which were being decrypted.
Similarly, with a comms channel that has any reasonable traffic, you would have a reduced size of each payload (maybe), but increased traffic volume (probably). Therefore, the only situations where this would be difficult would be low traffic, low message payload size situations (eg symmetric key loads).
For systems like a software update mechanism, where you might reasonably expect there to be a significant limit on the number of valid (and different) updates you can capture, this would probably be difficult. For systems where you have a comms stream, where you would easily get hundreds of thousands of data then it would probably be quite trivial.
I disagree on your analysis of HMAC. Yes, the key is XORed with IPAD and OPAD. However, this is only the bare key, not a key schedule. With AES, the key bits are duplicated across multiple round keys and expanded to hit many cache lines. So at least timing attacks are much easier on the expanded key schedule than on the bare key.
However, we really need hard examples to discuss this more since DPA is a very system-specific attack. We can go on and on about why something might or might not be more exploitable, but until you measure actual leakage on the real system, you can never be sure. I definitely agree both HMAC and AES will need DPA countermeasures and neither is inherently immune to this attack.
Comments are closed.