Timing-independent array comparison

January 7, 2010August 31, 2010 ~ Nate Lawson

I find it interesting that people tend to follow the same stages of awareness when first hearing about timing attacks. A recent discussion about Django redesigning their cookie scheme shows people at the various stages.

1. The timing difference is too small to attack

This is the result of not understanding statistics or how jitter can be modeled and filtered. Usually, people are thinking that a single measurement for a given input is probably obscured by noise. This is correct. That’s why you take many measurements for the same input and average them. The more noise, the more measurements you need. But computers are fast, so it doesn’t take long to do thousands or even millions of runs per guessed byte.

The basic statistical method is simple. You apply a hypothesis test of the mean of all runs with a given guess for a byte (say, 0) against the mean for all the others (1, 2, …, 255) If there is a significant difference in the means, then you have guessed that byte correctly. If no difference, repeat for the other 255 values. If still no difference, take more measurements.

The other misconception is that jitter is too great to get anything useful out of the measurements, especially in a network. There is an excellent paper by Crosby and Wallach that debunks this myth by carefully analyzing noise and its causes as well as how to filter it. They conclude that an attacker can reliably detect processing differences as low as 200 nanoseconds on a LAN or 30 microseconds on a WAN given only 1000 measurements. If your server is hosted somewhere an attacker can also buy rackspace, then you are vulnerable to LAN timing attacks.

2. I’ll just add a random delay to my function

Then I’ll take more measurements. See above.

3. I’ll add a deadline to my function and delay returning until it is reached

This usually has a higher overhead and is more prone to errors than implementing your function such that its timing does not vary, based on the input data. If you make a mistake in your interval calculation or an attacker is able to “stun” your function somewhere in the middle such that all target computations occur after the interval timer has expired, then you’re vulnerable again. This is similar to avoiding buffer overflow attacks by lengthening the buffer — better to fix the actual problem instead.

4. Ok, ok, I developed a constant-time function

How carefully did you analyze it? After citing my Google talk on crypto flaws, here was one attempt from the Reddit thread:

def full_string_cmp(s1, s2):
    total = 0
    for a, b in zip(s1, s2):
        total += (a != b)
    return not total

The first bug is that this opens a worse hole if the two strings are not the same length. In fact, an attacker can send in a zero-length string and the result of zip() is an empty array. This results in the for() loop never executing and any input being accepted as valid! The fix is to check the two lengths first to make sure they’re equal or in C, compare two fixed-length arrays that are guaranteed to be the same size.

The next bug is smaller, but still valid. There’s a timing leak in the comparison of a and b. When you write:

total += (a != b)

Both Python and a C compiler actually generate low-level instructions equivalent to:

if (a != b)
    tmp = 1
else
    tmp = 0
total += tmp

Thus, there is still a small timing leak. If they are equal, an additional branch instruction is executed. If they are not equal, this is skipped. A common intuitive mistake among programmers is that single lines of code are atomic. This is why you might find a C version of this such as “total += (a != b) ? 1 : 0”, which generates the same code. Just because it fits on a single line does not mean it is atomic or constant-time.

5. Ok, I fixed those bugs. Are we done yet?

Let’s see what we have now (adapted from my keyczar posting).

if len(userMsg) != len(correctValue):
    return False
result = 0
for x, y in zip(userMsg, correctValue):
    result |= ord(x) ^ ord(y)
return result == 0

This now uses an arithmetic operator (XOR) instead of a logical compare (!=), and thus should be constant-time. The += operator could possibly have a leak if carries take longer than an add without carry, so we went with |= instead. We check the lengths and abort if they don’t match. This does leak timing information about the length of the correct value, so we have to make sure that is not a problem. Usually it’s not, but if these were passwords instead of cryptographic values, it would be better to hash them with PBKDF2 or bcrypt instead of working with them directly. But are we done?

Maybe. But first we need to figure out if our high-level language has any behavior that affects the timing of this function. For example, what if there is sign-bit handling code in the Python interpreter that behaves differently if x or y is negative? What if the zip() operator has an optimization that compares two arrays first to see if they’re identical before returning their union?

The answer is you can never be sure. In C, you have more assurance but still not enough. For example, what if the compiler optimization settings change? Is this still safe? What about the microarchitecture of the CPU? What if Intel came out with a new byte-wise cache alignment scheme in the Pentium 1000?

I hope this has convinced some people that side channel attacks are not easily solved. Important code should be carefully reviewed to have higher assurance this class of attack has been mitigated.

17 thoughts on “Timing-independent array comparison”

David LeBlanc says:

January 11, 2010 at 11:26 am

Interesting post, but code review provides little assurance that you’ve really mitigated the problem. Reviewing the assembler will do better, assuming that the reviewer is good at assembler. Then we can even run into really nasty problems like cache locality, and so on. At the end of the day, if you’re worried about perf, you have to test for it.

Another factor that I’m somewhat suspicious of is the principle of least work. Some of this stuff requires enough work that a smaller amount of work will allow you to compromise the network anyway. If you can find a chunk of code where this is really the best available attack, you’ve got a very hardened target. That’s not to say that we should completely ignore the problem, as there are scenarios where it is very important.

Something I’m a little surprised you didn’t cover, except in passing, is that doing a string match on a password is just a bad idea to start with. If you never store clear-text passwords, then clear-text passwords can’t be stolen. If you store the result of a KDF, that’s much better, and if you have 50,000 hashing operations going on, it might be hard to notice if the hash value comparison bails early or not.
1. Nate Lawson says:
  
  January 11, 2010 at 1:19 pm
  
  It’s possible that there are simpler software flaws, but the thing I like about side channel attacks is that they are everywhere. While memory vulns may be present in greater or lesser numbers depending on the developer’s awareness, almost no one checks for side channel attacks. Since my time is scarce, I go for the most likely flaws first.
  
  One nit about your comment on password hashing: the quantity of hashing operations do not significantly help side channel resistance. Even if a single hash was applied, the difficulty of guessing bits of the hash goes up by 2x for each additional bit you try to guess. That’s because you have to generate a 2nd-preimage for each bit of your guess. By the time you’re up to the full hash size, this is obviously infeasible.
  
  However, side channel leaks can help an attacker even with hashed passwords. If they can detect the first few bytes of the password hash with a timing attack, this assists in implementing an offline dictionary search. The attacker can hash candidate passwords and reject them if the first few bytes of the hash don’t match the expected value. Passwords that do match can be tested against the server. So you’ve turned a full-online attack into a partial-offline attack, which is often meaningful.
  
  Better to use a timing-independent compare in all cases. BTW, nice to see you again.
kme says:

January 12, 2010 at 6:46 pm

It strikes me that it would be somewhat handy if we could get some compiler assistance for writing data-independent-time functions. I wonder if the idea of a gcc function attribute that specifies “the execution time of this function may only depend on the values of this (possibly empty) list of variables” is possible – even if it can only warn/error in cases where the condition isn’t met, rather than actually tweak the code generator to enforce it.
1. Nate Lawson says:
  
  January 13, 2010 at 6:11 pm
  
  Since the timing ultimately depends on the processor, microarchitecture, and environment from run-to-run, it’s too difficult for the compiler to see this. Even with multiple test runs with valid and invalid values, it may not be clear the varying timing is a problem. This is a design flaw you’d need to look for manually.
  
  To put it another way, the time spent adding the function decorator could be spent manually reviewing it.
kuza55 says:

January 24, 2010 at 8:20 pm

The difficulty of actually eliminating all timing sources seems to be an argument for adding a deadline.

I realise in a synchronous environment this kills performance, and estimating worst case execution time is not easy, and if you get it wrong you’ll be leaking something (probably less than before, since you could return an error when scheduled, if it is taking too long), but in an concurrent environment it should only increase latency, while keeping the same throughput, and it seems a much more usable approach than digging into what the hell an entire system does (timing wise), when you execute statements in a high level language.

Of course, if you’re performing operations without an upper bound (e.g. comparing arbitrary length strings), you’ve got problems, but it still seems easier to remedy a problem this way than to do what you described above.

Also, while I agree there is a timing difference between taking a branch and not taking one (depending on branch prediction, cache misses, memory stalls, etc, etc, etc), is such a delay at all observable in the noise of a real system? (I realise that if the average noise per set of samples is less than the timing difference you want to check, you can, but this seems unlikely, especially when you add networks to the picture).

Also, has anyone ever released any code to demonstrate a remote timing attack on anything?
1. Nate Lawson says:
  
  January 26, 2010 at 9:48 pm
  
  Good questions overall. The deadline approach is more complicated and subject to failures as you acknowledge. In contrast, nearly all variable-timing cases can be converted to constant time with some thought. So why choose the more fragile and complicated solution?
  
  Branch prediction attacks are quite powerful. I’ll refer you to the papers by Aciicmez and Seifert for details.
  
  The Brumley and Boneh remote timing attack paper did not come with exploit code. I’m sure it would be a nice contribution to implement the attack and release the code, so if you’re looking for a project, it should be fun.
Daniel Cheng says:

January 31, 2010 at 7:59 pm

Just add more (non-random) delay. If the function is too slow, you can’t more measurements.
1. Nate Lawson says:
  
  February 2, 2010 at 9:16 am
  
  That is the “deadline approach”. See the comments above for why that is not a good plan.
2. Rena says:
  
  February 6, 2010 at 11:36 am
  
  Possibly a delay whose length is determined by the bytes being compared? A simple thing like usleep(userMsg[i]) would still be leaking, but I feel like there might be a method that wouldn’t.
  
  For short strings, especially on 64-bit machines, you can do basically what #5 does copying chunks of the string into CPU registers. You’d have to use a buffer whose length is a multiple of the register size, and zero it beforehand (constant time).
  
  The method in #5 still has two flaws, though: the return condition is reversed (returns zero if strings match or are not same length), and if you fixed that, you’d be leaking information about the inputs in the return value. The proper solution would boil down to if(result) return false; else return true; – since this branch depends on whether the entire string matched, it isn’t of any use in a timing attack.
  1. Nate Lawson says:
    
    February 8, 2010 at 9:34 am
    
    The deadline approach is not the right choice. As I mentioned, it’s too easy for a condition to be violated (especially by a malicious attacker influencing your execution environment) and once you’re out of the deadline region, you’re right back in vulnerable territory.
    
    A variable-timing comparison loop is still flawed, even if you use CPU registers. Sure the timing difference is much smaller (N instructions, where N is the length of buffer correctly matched), but it’s still there. Who’s to say an attacker can’t do a branch-prediction timing attack to detect the length of the loop?
    
    Also, you’ve gone to a much lower level, assuming a specific CPU type, which is part of the point of this article. It’s hard to have assurance you have solved timing channels without low-level access to the CPU.
    
    Finally, #5 does not have the flaws you suggest. #4 was actually written separately by a Reddit poster and it assumed success was non-zero. It’s just a difference in the caller’s expectations, not a security flaw. It can easily be corrected by doing “return not result” instead of “return result”. That doesn’t leak information about the inputs other than the ultimate answer (match or no match), which is fine from our security perspective.
Esta O.G. says:

April 8, 2010 at 9:16 am

Your explanation of what code the compiler will generate is inaccurate. Adding a zero to anything is a no-op (except with strict floating-point, but that’s neither here nor there), and most any optimizing compiler in the 21st century is written to utilize this fact.

The actual code generated will be something along the lines of “if(a == b) continue; total++;” — note the absence of a temporary register, and the direct branch to the loop-advancing part. Some compilers, particularly ones for Itanium and Power architectures, will remove the conditional branch altogether by use of a “set register from condition code” operation.
1. Nate Lawson says:
  
  April 8, 2010 at 2:41 pm
  
  Note the word “equivalent to” in the post. I know a compiler wouldn’t assign 0 on every iteration. The point is that there is still a conditional branch, which is a small timing leak. I’m talking to people who think “a = (b != c)” does not vary in timing because it’s all on one line.
Sid says:

July 16, 2010 at 5:52 pm

“2. I’ll just add a random delay to my function

Then I’ll take more measurements. See above.”

What if the time-delay is a hard to invert function of the time taken by the function? Wouldn’t the adversary then have to invert the function to succeed?
1. Sid says:
  
  July 16, 2010 at 6:27 pm
  
  Never mind! I guess one could still compute X since the expected value is linear so one will be able to separate out x+f(x) where f is the hard to invert function.
  1. Nate Lawson says:
    
    July 19, 2010 at 11:17 am
    
    Sid, exactly. If the function is fixed/linear, it is easy to filter out (e.g., by least squares fitting and then subtracting off the function).
    
    http://mathworld.wolfram.com/LeastSquaresFitting.html
    
    On the other hand, if it’s random, it will approximate a normal distribution and can be filtered that way. The random samples that “collide” with the signal do cause trouble, but that only requires more samples on the part of the attacker.
    
    Since it is difficult to be sure you’ve caused an attacker enough trouble (where “enough” is very target-specific), it’s better to remove the signal than increase the noise.
Hinnerk says:

August 31, 2010 at 7:26 am

You missed the comparison of “return result == 0” in the final function.
1. Nate Lawson says:
  
  August 31, 2010 at 9:03 am
  
  Thanks, updated. It was in the article I referred to but didn’t make it along with the cut/paste.