Timing-safe memcmp and API parity

OpenBSD released a new API with a timing-safe bcmp and memcmp. I strongly agree with their strategy of encouraging developers to adopt “safe” APIs, even at a slight performance loss. The strlcpy/strlcat family of functions they pioneered have been immensely helpful against overflows.

Data-independent timing routines are extremely hard to get right, and the farther you are from the hardware, the harder it is to avoid unintended leakage. Your best bet if working in an embedded environment, is to use assembly and thoroughly test on the target CPU under multiple scenarios (interrupts, power management throttling clocks, etc.) Moving up to C creates a lot of pitfalls, especially if you support multiple compilers and versions. Now you are subject to micro-architectural variance, such as cache, branch prediction, bus contention, etc. And compilers have a lot of leeway with optimizing away code with strictly-intended behavior.

While I think the timing-safe bcmp (straightforward comparison for equality) is useful, I’m more concerned with the new memcmp variant. It is more complicated and subject to compiler and CPU quirks (because of the additional ordering requirements), may confuse developers who really want bcmp, and could encourage unsafe designs.

If you ask a C developer to implement bytewise comparison, they’ll almost always choose memcmp(). (The “b” series of functions is more local to BSD and not Windows or POSIX platforms.) This means that developers using timingsafe_memcmp() will be incorporating unnecessary features simply by picking the familiar name. If compiler or CPU variation compromised this routine, this would introduce a vulnerability. John-Mark pointed out to me a few ways the current implementation could possibly fail due to compiler optimizations. While the bcmp routine is simpler (XOR accumulate loop), it too could possibly be invalidated by optimization such as vectorization.

The most important concern is if this will encourage unsafe designs. I can’t come up with a crypto design that requires ordering of secret data that isn’t also a terrible idea. Sorting your AES keys? Why? Don’t do that. Database index lookups that reveal secret contents? Making array comparison constant-time fixes nothing when the index involves large blocks of RAM/disk read timing leaks. In any scenario that involves the need for ordering of secret data, much larger architectural issues need to be addressed than a comparison function.

Simple timing-independent comparison is an important primitive, but it should be used only when other measures are not available. If you’re concerned about HMAC timing leaks, you could instead hash or double-HMAC the data and compare the results with a variable-timing comparison routine. This takes a tiny bit longer but ensures any leaks are useless to an attacker. Such algorithmic changes are much safer than trying to set compiler and CPU behavior in concrete.

The justification I’ve heard from Ted Unangst is “API parity“. His argument is that developers will not use the timing-safe routines if they don’t conform to the ordering behavior of memcmp. I don’t get this argument. Developers are more likely to be annoyed with the sudden performance loss of switching to timing-safe routines, especially for comparing large blocks of data. And, there’s more behavior that should intentionally be broken in a “secure memory compare” routine, such as two zero-length arrays returning success instead of an error.

Perhaps OpenBSD will reconsider offering this routine purely for the sake of API parity. There are too many drawbacks.