Thought experiment on protocols and noise

I hesitate to call this an interview question because I don’t think on-the-spot puzzle solving equates to a good engineering hire. On the other hand, I try to explore some simple thought experiments with candidates that have a security background.

One of these involves a protocol that has messages authenticated by an HMAC. There’s a message (with appropriate internal format) and a SHA-256 HMAC that covers it. As the implementer, you receive a message that doesn’t verify. In other words, your calculated MAC isn’t the same as the one in the message. What do you do?

“Throw an error” is a common response. But is there something more clever you could do? What if you could tell whether the message had been tampered with or if this was an innocent network error due to noise? How could you do that?

Some come up with the idea of calculating the Hamming distance or other comparison between the computed and message HMACs. If they are close, it’s unlikely that the message had been corrupted, due to the avalanche property of secure hash functions. If not, it may be a bit flip in the message, possibly due to an attack.

Ok, you can distinguish whether the MAC had a small number of errors or the message itself. Is this helpful, and is it secure? Consider:

  • If you return an error, which one do you return? At what point in the processing?
  • Does knowing which type of error occurred help an attacker? Which kind of attacker?
  • If you chose to allow up to 8 flipped bits in the MAC while still accepting the message, is that secure? If so, at what number of bits would it be insecure? Does the position of the bits matter?

There comes a moment when every engineer comes up with some “clever” idea like the above. If she’s had experience attacking crypto, the next thought is one of intense unease. The unschooled engineer has no such qualms, and thus provides full employment for Root Labs.

Catching up on recent crypto developments

When I started this blog, the goal was to write long-form posts that could serve as a standalone intro to security and crypto topics. Rather than write about the history of the NSA as planned, I’ll try writing a few short notes in hopes that they’ll fit better within the time I have. (Running a company and then launching a new one the past few years has limited my time.)

Heartbleed has to be the most useful SSL bug ever. It has launched not just one, but two separate rewrites of OpenSSL. I’m hoping it will also give the IETF more incentive to reject layering violations like the heartbeat extension. Security protocols are for security, not path MTU discovery.

Giving an attacker a way to ask you to say a specific phrase is never a good idea. Worse would be letting them tell you what to say under encryption.

Earlier this year, I was pleased to find out that a protocol I designed and implemented has been in use for millions (billions?) of transactions the past few years. During design, I spent days slaving over field order and dependencies in order to force implementations to be as simple as possible. “Never supply the same information twice in a protocol” was the mantra, eliminating many length fields and relying on a version bump at the start of the messages if the format ever changed. Because I had to create a variant cipher mode, I spent 5x the initial design time scrutinizing the protocol for flaws, then paid a third-party for a review.

As part of the implementation, I provided a full test harness and values covering all the valid and error paths. I also wrote a fuzzer and ran that for days over the final code to check for any possible variation in behavior, seeding it with the test cases. I encouraged the customer to integrate these tests into their release process to ensure changes to the surrounding code (e.g., 32/64 bit arch) didn’t break it. Finally, I helped with the key generation and production line design to be sure personalization would be secure too.

I firmly believe this kind of effort is required for creating security and crypto that is in widespread use. This shouldn’t be extraordinary, but it sadly seems to be so today. It was only through the commitment of my customer that we were able to spend so much effort on this project.

If you have the responsibility to create something protecting money or lives, I hope you’ll commit to doing the same.

Digging Into the NSA Revelations

Last year was a momentous one in revelations about the NSA, technical espionage, and exploitation. I’ve been meaning for a while to write about the information that has been revealed by Snowden and what it means for the public crypto and security world.

Part of the problem has been the slow release of documents and their high-level nature. We’ve now seen about 6 months of releases, each covering a small facet of the NSA. Each article attempts to draw broad conclusions about the purpose, intent, and implementation of complex systems, based on leaked, codeword-laden Powerpoint. I commend the journalists who have combed through this material as it is both vague and obfuscated, but I often cringe at the resulting articles.

My rule of thumb whenever a new “earth shattering” release appears is to skip the article and go straight for the backing materials. (Journalists, please post your slide deck sources to a publicly accessible location in addition to burying them in your own site’s labyrinth of links.) By doing so, I’ve found that some of the articles are accurate, but there are always a number of unwarranted conclusions as well. Because of the piecemeal release process, there often aren’t enough additional sources to interpret each slide deck properly.

I’m going to try to address the revelations we’ve seen by category: cryptanalysis, computer exploitation, software backdoors, network monitoring, etc. There have been multiple revelations in each category over the past 6 months, but examining them in isolation has resulted in reversals and loose ends.

For example, the first conclusion upon the revelation of PRISM was that the NSA could directly control equipment on a participating service’s network in order to retrieve emails or other communications. Later, the possibility of this being an electronic “drop box” system emerged. As of today, I’m unaware of any conclusive proof as to which of these vastly differing implementations (or others) were referred to by PRISM.

However, this implementation difference has huge ramifications for what the participating services were doing. Did they provide wholesale access to their networks? Or were they providing court-ordered information via a convenient transfer method after reviewing the requests? We still don’t know for sure, but additional releases seem to confirm that at least many Internet providers did not intentionally provide wholesale access to the NSA.

Unwarranted jumping to conclusions has created a new sport, the vendor witch hunt. For example, the revelation of DROPOUTJEEP, an iPhone rootkit, was accompanied by allegations that Apple cooperated with the NSA to create it. It’s great that Jacob Applebaum worked with the Spiegel press, applying his technical background, but he is overreaching here.

Jacob said, “either they [NSA] have a huge collection of exploits that work against Apple products … or Apple sabotaged it themselves.” This ignores a third option, which is that reliable exploitation against a limited number of product versions can be achieved with only a small collection of exploits.

The two critical pieces of information that were underplayed here are that the DROPOUTJEEP description was dated October 1, 2008 and says “the initial release will focus on installing the implant via close access methods” (i.e., physical access) and “status: in development”.

What was October 2008 like? Well, there were two iPhones, the original and just-released 3G model. There were iOS versions 1.0 – 1.1.4 and 2.0 – 2.1 available as well. Were there public exploits for this hardware and software? Yes! The jailbreak community had reliable exploitation (Pwnage and Pwnage 2.0) on all of these combinations via physical access. In fact, these exploits were in the boot ROM and thus unpatchable and reliable. Meanwhile, ex-NSA TAO researcher Charlie Miller publicly exploited iOS 1.x from remote in summer 2007.

So the NSA in October 2008 was in the process of porting a rootkit to iOS, with the advantage of a publicly-developed exploit in the lowest levels of all models of the hardware, and targeting physical installation. Is there any wonder that such an approach would be 100% reliable? This is a much simpler explanation and is not particularly flattering to the NSA.

One thing we should do immediately is stop the witch hunts based on incomplete information. Some vendors and service providers have assisted the NSA and some haven’t. Some had full knowledge of what they were doing, some should have known, and others were justifiably unaware. Each of their stories is unique and should be considered separately before assuming the worst.

Next time, I’ll continue with some background on the NSA that is essential to interpreting the Snowden materials.

Cyber-weapon authors catch up on blog reading

One of the more popular posts on this blog was the one pointing out how Stuxnet was unsophisticated. Its use of traditional malware methods and lack of protection for the payload indicated that the authors were either “Team B” or in a big hurry. The post was intended to counteract the breathless praise in the press for the advent of sophisticated “cyber-weapons”.

This year, more information was released in the New York Times that gave more support for both theories. The authors may not have had a lot of time due to political pressure and concern about Iran’s progress. The uneasy partnership between the US and Israel may have led to both parties keeping their best tricks in their back pockets.

A lot of people seemed skeptical about the software protection method I described called “secure triggers”. (I had written about this before also, calling it “hash-and-decrypt”.) The general idea is to gather information about the environment in order to generate a cryptographic key, which is used to decrypt the payload. If even one bit of info is incorrect, the payload can’t be decrypted. The analyst has to brute-force the proper environment, which can be made infeasible if there’s enough entropy and/or the validation method is too slow.

The critics claimed that secure triggers were too complicated or unable to withstand malware analyst scrutiny. However, this approach had been used successfully in everything from Core Impact to Blu-ray to Team Twiizers exploits, so it was feasible. Either the malware developers were not aware of this technique or there were other constraints, such as time, preventing it from being used.

Now we’ve got Gauss, which uses (surprise!) this exact technique. And, it turns out to be somewhat effective in preventing Kaspersky from analyzing the payload. We either predicted or caused the future, take your pick.

Is this the endgame? Not even, but it does mean we’re ready for the next stage.

The malware industry has had a stable environment for a while. Targeted attacks were rare, and most new malware authors hadn’t spent a lot of effort building in custom protection for their payloads. Honeypots and local analysis methods assume the code and behavior remain stable between the malware analyst’s environment and the intended target.

In the next stage, proper use of mechanisms like secure triggers will divide malware analysis into two phases: infection and payload. The infection stage can be analyzed with traditional techniques in order to find the security flaws exploited, propagation method, etc. The payload stage will change drastically, with more effort being spent on in situ analysis.

When the payload only decrypts and runs on a single target system, the malware analyst will need direct access to the compromised host. There are several forms this might take. The obvious one is providing a remote shell to the analyst to log in, attach a debugger, try to get a memory dump of the process, etc. This is dangerous because it involves giving an outsider access to a trusted system, and one that might be critical to other operations. Even if a whole-system memory dump is generated, say by physical access or a cold-boot attack, there is still going to be a lot of sensitive information there.

Another approach is emulation. The analyst uses a VM that turns all local syscalls into remote ones. This is connected to the compromised target host (or a clone of it), which runs a daemon to answer the API queries. The malware sample or relevant portions of it (such as the hash-and-decrypt routine) are run in the analyst’s VM, but the information the routine gathers comes directly from the compromised host. This allows the analyst to gather the relevant information while not having full access to the compromised machine.

In the next phase after this, malware authors add more anti-emulation checks to their payload decryption routine. They try to prevent this routine from being run in isolation, in an emulator. Eventually, you end up in a cat-and-mouse game of Core Wars on the live hardware. Malware keeps a closely-synchronized global heartbeat so that any attempt to dump and restart it on a single host corrupts its state irrecoverably. The payload, its triggers, and encryption keys evolve in coordination with the other hosts on the network and are tied closely to each machine’s identity.

Is this where we’re headed? I’m not sure, but I do know that software protection measures are becoming more, not less relevant.

RSA repeats earlier claims, but louder

Sam Curry of RSA was nice enough to respond to my post. Here’s a few points that jumped out at me from what he wrote:

  • RSA is in the process of fixing the downgrade attack that allows an attacker to choose PKCS #1 v1.5, even if the key was generated by a user who selected v2.0.
  • They think they also addressed the general attack via their RAC 3.5.4 middleware update. More info is needed on what that fix actually is. I haven’t seen the words “firmware update” or “product recall” in any of their responses, so no evidence they decided to fix the flaw in the token itself.
  • We shouldn’t call it “SecurID” even though the product name is “RSA SecurID 800”. Or to put it another way, “When we want brand recognition, call it ‘SecurID’. When it’s flawed, call it ‘PKCS #1 v1.5.'”

However, his main point is that, since this is a privilege escalation attack, any gain RSA has given the attacker is not worth mentioning. In his words:

“Any situation where the attacker has access to your smartcard device and has your PIN, essentially compromises your security. RSA maintains that if an attacker already has this level of access, the additional risk of the Bleichenbacher attack does not substantially change the already totally compromised environment.”

Note the careful use of “substantially change” and “totally compromised environment”. They go farther on this tack, recommending the following mitigation approaches.

  • (Tokens) should not be left parked in the USB port any longer than necessary
  • The owner needs to maintain control of their PIN
  • The system which the device is being used on should be running anti-malware.

Their security best practices involve recommending that users limit access to the token while it is in a state to perform crypto operations for the user or attacker. This is good general advice, but it is not directly relevant to this attack for two reasons:

  1. The attack allows recovery of keys protected by the token, and then no further access to it is required
  2. It takes only a short amount of time and can be performed in stages

First, the attack allows key recovery (but not of the private key, as RSA points out over and over). There are three levels of potential compromise of a token like this one:

  1. Temporary online access: attacker can decrypt messages by sending them to the token until it’s disconnected
  2. Exposure of wrapped keys: attacker can decrypt past or future messages offline, until the wrapped keys are changed
  3. Exposure of the master private key: attacker can recover future wrapped keys until the private key is changed

RSA is claiming there’s no important difference between #1 and #2. But the whole point of a physical token is to drive a wedge between these exact cases. Otherwise, you could store your keys on your hard drive and get the same effect — compromise of your computer leads to offline ability to decrypt messages. To RSA, that difference isn’t a “substantial change”.

By screwing up the implementation of their namesake algorithm, RSA turned temporary access to a token into full access to any wrapped keys protected by it. But sure, the private key itself (case #3) is still safe.

Second, they continue to insist that end-user behavior can be important to mitigating this attack. The research paper shows that it takes only a few thousand automated queries to recover a wrapped key (e.g., minutes). Even if you’re lightning fast in unplugging your token, the attack can be performed in stages. There’s no need for continuous access to the token.

After the wrapped keys are recovered, they can be used for offline decryption until changed. No further access is needed to the token until the wrapped keys are changed.

The conclusion is really simple: the RSA SecurID 800 token fails to protect its secrets. An attacker with software-only access (even remote) to the token can recover its wrapped keys in only a few minutes each. A token whose security depends on how fast you unplug it isn’t much of a token.

Why RSA is misleading about SecurID vulnerability

There’s an extensive rebuttal RSA wrote in response to a paper showing that their SecurID 800 token has a crypto vulnerability. It’s interesting how RSA’s response walks around the research without directly addressing it. A perfectly accurate (but inflammatory) headline could also have been “RSA’s RSA Implementation Contained Security Flaw Known Since 1998“.

The research is great and easy to summarize:

  • We optimized Bleichenbacher’s PKCS #1 v1.5 attack by about 5-10x
  • There are a number of different oracles that give varying attacker advantage
  • Here are a bunch of tokens vulnerable to this improvement of the 1998 attack

Additional interesting points from the paper:

  • Aladdin eTokenPro is vulnerable to a simple Vaudenay CBC padding attack as well. Even worse!
  • RSA implemented the worst oracle of the set the authors enumerate, giving the most attacker advantage.
  • If you use PKCS #1 v2.0, you should be safe against the Bleichenbacher attack. Unless you use RSA’s implementation, which always sets a flag in generated keys that allows selecting v1.5 and performing a slight variant of this attack.

The real conclusion is that none of the manufacturers seemed to take implementation robustness seriously. Even the two implementations that were safe from these attacks were only safe because implementation flaws caused them to not provide useful information back to the attacker.

The first counterclaim RSA makes is that this research does not compromise the private key stored on the token. This is true. However, it allows an attacker to decrypt and recover other “wrapped” keys encrypted by the token’s key pair. This is like saying an attacker is running a process with root access but doesn’t know the root password. She can effectively do all the same things as if she did have the password, at least until the process is killed.

RSA is ignoring the point that even a legitimate user should not be able to recover these encrypted “wrapped” keys. They can only cause the token to unwrap and use them on the operator’s behalf, not recover the keys themselves. So this attack definitely qualifies as privilege escalation, even if performed by the authorized user herself.

The second claim is that this attack requires local access and a PIN. This is also correct, although it depends on some assumptions. PKCS #11 is an API, so RSA really has no firm knowledge how all their customers are using it. Some applications may proxy access to the token via a web frontend or other network access. An application may cache the PIN. As with other arguments that privilege escalation attacks don’t matter, it assumes a lot about the customer and attacker profile that RSA has no way of knowing.

The final claim is that OAEP (PKCS #1 v2.0) is not subject to this vulnerability. This is true. But this doesn’t address the issue raised in the paper where RSA’s implementation sets flags in the key to allow the user to choose v2.0 or v1.5. Hopefully, they’ll be fixing this despite not mentioning it here.

RSA has taken a lot of heat due to the previous disclosure of all the SecurID seeds, so perhaps the press has focused on them unfairly. After all, the research paper shows that many other major vendors had the same problem. My conclusion is that we have a long way to go in getting robust crypto implementations in this token market.