Timing-safe memcmp and API parity

OpenBSD released a new API with a timing-safe bcmp and memcmp. I strongly agree with their strategy of encouraging developers to adopt “safe” APIs, even at a slight performance loss. The strlcpy/strlcat family of functions they pioneered have been immensely helpful against overflows.

Data-independent timing routines are extremely hard to get right, and the farther you are from the hardware, the harder it is to avoid unintended leakage. Your best bet if working in an embedded environment, is to use assembly and thoroughly test on the target CPU under multiple scenarios (interrupts, power management throttling clocks, etc.) Moving up to C creates a lot of pitfalls, especially if you support multiple compilers and versions. Now you are subject to micro-architectural variance, such as cache, branch prediction, bus contention, etc. And compilers have a lot of leeway with optimizing away code with strictly-intended behavior.

While I think the timing-safe bcmp (straightforward comparison for equality) is useful, I’m more concerned with the new memcmp variant. It is more complicated and subject to compiler and CPU quirks (because of the additional ordering requirements), may confuse developers who really want bcmp, and could encourage unsafe designs.

If you ask a C developer to implement bytewise comparison, they’ll almost always choose memcmp(). (The “b” series of functions is more local to BSD and not Windows or POSIX platforms.) This means that developers using timingsafe_memcmp() will be incorporating unnecessary features simply by picking the familiar name. If compiler or CPU variation compromised this routine, this would introduce a vulnerability. John-Mark pointed out to me a few ways the current implementation could possibly fail due to compiler optimizations. While the bcmp routine is simpler (XOR accumulate loop), it too could possibly be invalidated by optimization such as vectorization.

The most important concern is if this will encourage unsafe designs. I can’t come up with a crypto design that requires ordering of secret data that isn’t also a terrible idea. Sorting your AES keys? Why? Don’t do that. Database index lookups that reveal secret contents? Making array comparison constant-time fixes nothing when the index involves large blocks of RAM/disk read timing leaks. In any scenario that involves the need for ordering of secret data, much larger architectural issues need to be addressed than a comparison function.

Simple timing-independent comparison is an important primitive, but it should be used only when other measures are not available. If you’re concerned about HMAC timing leaks, you could instead hash or double-HMAC the data and compare the results with a variable-timing comparison routine. This takes a tiny bit longer but ensures any leaks are useless to an attacker. Such algorithmic changes are much safer than trying to set compiler and CPU behavior in concrete.

The justification I’ve heard from Ted Unangst is “API parity“. His argument is that developers will not use the timing-safe routines if they don’t conform to the ordering behavior of memcmp. I don’t get this argument. Developers are more likely to be annoyed with the sudden performance loss of switching to timing-safe routines, especially for comparing large blocks of data. And, there’s more behavior that should intentionally be broken in a “secure memory compare” routine, such as two zero-length arrays returning success instead of an error.

Perhaps OpenBSD will reconsider offering this routine purely for the sake of API parity. There are too many drawbacks.

In Defense of JavaScript Crypto

Thai Duong wrote a great post outlining why he likes JavaScript crypto, although it’s not as strong a defense as you might guess from the title. While he makes some fair points of some limited applications of JavaScript, his post is actually a great argument against those pushing web page JS crypto.

First, he starts off with a clever Unicode attack on JS AES by Bleichenbacher. It is a great way to illustrate how the language, with its bitwise and type hostility, actively works against crypto implementers. Though Thai points out lots of different ways to work around these problems, I disagree that it’s clear sailing for developers once your crypto library deals with these issues. You’ll get to pick up where your low-level library left off.

Oh, and those of you were looking for defense of web page crypto for your latest app? Sorry, that’s still dumb. Google’s End-to-End will only be shipped as a browser extension.

The most ironic part of Thai’s post involves avoiding PCI audit by shipping JS to the browser to encrypt credit card numbers. Back in 2009, I gave a Google Tech Talk on a variety of crypto topics. Afterwards, a Google engineer came up to me and gave exactly Thai’s example as the perfect use case for JavaScript crypto. “We avoid the web server being in scope for PCI audits by shipping JS to the user,” he said. “This way we can strip off the SSL as soon as possible at the edge and avoid the cost of full encryption to the backend.”

He went on to describe a race-condition prone method of auditing Google’s own web servers, hashing the JS file served by each to look for compromised copies. When I pointed out this was trivial to bypass, he said it didn’t really matter because PCI is a charade anyway.

While he’s right that PCI is more about full employment for auditors & vendors than security, news about NSA tapping the Google backbone also shows why clever ways to avoid end-to-end encryption often create unintended weaknesses. I hope Google no longer underestimates their exposure to nation-state adversaries after the Snowden revelations, but this use-case for JS crypto apparently hasn’t died yet.