Thought experiment on protocols and noise

I hesitate to call this an interview question because I don’t think on-the-spot puzzle solving equates to a good engineering hire. On the other hand, I try to explore some simple thought experiments with candidates that have a security background.

One of these involves a protocol that has messages authenticated by an HMAC. There’s a message (with appropriate internal format) and a SHA-256 HMAC that covers it. As the implementer, you receive a message that doesn’t verify. In other words, your calculated MAC isn’t the same as the one in the message. What do you do?

“Throw an error” is a common response. But is there something more clever you could do? What if you could tell whether the message had been tampered with or if this was an innocent network error due to noise? How could you do that?

Some come up with the idea of calculating the Hamming distance or other comparison between the computed and message HMACs. If they are close, it’s unlikely that the message had been corrupted, due to the avalanche property of secure hash functions. If not, it may be a bit flip in the message, possibly due to an attack.

Ok, you can distinguish whether the MAC had a small number of errors or the message itself. Is this helpful, and is it secure? Consider:

  • If you return an error, which one do you return? At what point in the processing?
  • Does knowing which type of error occurred help an attacker? Which kind of attacker?
  • If you chose to allow up to 8 flipped bits in the MAC while still accepting the message, is that secure? If so, at what number of bits would it be insecure? Does the position of the bits matter?

There comes a moment when every engineer comes up with some “clever” idea like the above. If she’s had experience attacking crypto, the next thought is one of intense unease. The unschooled engineer has no such qualms, and thus provides full employment for Root Labs.

Has HTML5 made us more secure?

Brad Hill recently wrote an article claiming that HTML5 has made us more secure, not less. His essential claim is that over the last 10 years, browsers have become more secure. He compares IE6, ActiveX, and Flash in 2002 (when he started in infosec) with HTML5 in order to make this point. While I think his analysis is true for general consumers, it doesn’t apply to more valuable targets, who are indeed less secure with the spread of HTML5.

HTML5 is a broad grouping of features, and there are two parts that I think are important to increasing vulnerability. First, there is the growing flexibility in parsing elements for JavaScript, CSS, SVG, etc., including the interpretation of relationships between them. Second, there’s the exposure of complex decoders for images, video, audio, storage, 3D graphics, etc. to untrusted sources.

If you look at the vulnerability history for these two groups, both are common culprits in flaws that lead to untrusted code execution. They still regularly exhibit “game over” vulnerabilities, in Firefox and Chrome. Displaying a PNG has been exploitable as recently as 2012. Selecting a font via CSS was exploitable in 2010. In many cases, these types of bugs are interrelated. A flaw in a codec could require heap grooming via JavaScript to be reliably exploitable. HTML5’s increased surface area of more parsing and complex decoders standardizes remote, untrusted access to components that are still the biggest source of code execution vulnerabilities in the browser, despite attempts to audit and harden them.

Additionally, it exposes elements that have not had this kind of attention. WebGL hands over access to your 3D graphics stack, something which even CERT thinks is worth disabling. If you want to know the future of exploitation, you need to keep an eye on the console and iPhone/Android hacking groups. 3D shaders were the first software exploit of the Xbox 360, a platform that is much more secure than any browser. And Windows GDI was remotely exploitable in 2009. Firefox WebGL is built on top of Mesa, which is software from the bad old days of 1993. How is it going to do any better than Microsoft’s most secure platform?

As an aside, a rather poor PR battle about WebGL is worth addressing here. An article by a group called Context in 2011 raised some of these same issues, but their exploit was only a DoS. Mozilla devs jumped on this right away. Their solution is a whitelist and blacklist for graphics drivers. A blacklist is great for everyone after a 0-day has been discovered and fixed and deployed, but not so good before then.

Call me a luddite, but I measure security by what I can easily disable or route around and ignore. Flash is easily blocked and can be uninstalled. JavaScript can be disabled with a browser setting or filtered. But HTML5? Well, that’s knit into pretty much every area of the browser. You want to disable WebGL? No checkbox, but at least there’s about:config. Just make sure no one set “webgl.force-enabled” or whatever the next software update adds to your settings. Want to disable parts of CSS but not page layout? Want a no-codec browser? Get out the compiler.

Browser vendors don’t care about the individual target getting compromised; they care about the masses. The cost/benefit tradeoff for these two groups are completely opposite. Otherwise, we’d see vendors competing for who could remove as many features as possible to produce the qmail of browsers.

Security happens in waves. If you’re an ordinary user, the work of Microsoft and Google in particular have paid off for you over the past 10 years. But woe to you if you manage high-value targets. The game of whack-a-mole with the browser vendors has been getting worse, not better. The more confident they get from their bug bounties and hardening, the more likely they are to add complex, deeply intertwined features. And so the pendulum begins swinging back the other way for everyone.

Cyber-weapon authors catch up on blog reading

One of the more popular posts on this blog was the one pointing out how Stuxnet was unsophisticated. Its use of traditional malware methods and lack of protection for the payload indicated that the authors were either “Team B” or in a big hurry. The post was intended to counteract the breathless praise in the press for the advent of sophisticated “cyber-weapons”.

This year, more information was released in the New York Times that gave more support for both theories. The authors may not have had a lot of time due to political pressure and concern about Iran’s progress. The uneasy partnership between the US and Israel may have led to both parties keeping their best tricks in their back pockets.

A lot of people seemed skeptical about the software protection method I described called “secure triggers”. (I had written about this before also, calling it “hash-and-decrypt”.) The general idea is to gather information about the environment in order to generate a cryptographic key, which is used to decrypt the payload. If even one bit of info is incorrect, the payload can’t be decrypted. The analyst has to brute-force the proper environment, which can be made infeasible if there’s enough entropy and/or the validation method is too slow.

The critics claimed that secure triggers were too complicated or unable to withstand malware analyst scrutiny. However, this approach had been used successfully in everything from Core Impact to Blu-ray to Team Twiizers exploits, so it was feasible. Either the malware developers were not aware of this technique or there were other constraints, such as time, preventing it from being used.

Now we’ve got Gauss, which uses (surprise!) this exact technique. And, it turns out to be somewhat effective in preventing Kaspersky from analyzing the payload. We either predicted or caused the future, take your pick.

Is this the endgame? Not even, but it does mean we’re ready for the next stage.

The malware industry has had a stable environment for a while. Targeted attacks were rare, and most new malware authors hadn’t spent a lot of effort building in custom protection for their payloads. Honeypots and local analysis methods assume the code and behavior remain stable between the malware analyst’s environment and the intended target.

In the next stage, proper use of mechanisms like secure triggers will divide malware analysis into two phases: infection and payload. The infection stage can be analyzed with traditional techniques in order to find the security flaws exploited, propagation method, etc. The payload stage will change drastically, with more effort being spent on in situ analysis.

When the payload only decrypts and runs on a single target system, the malware analyst will need direct access to the compromised host. There are several forms this might take. The obvious one is providing a remote shell to the analyst to log in, attach a debugger, try to get a memory dump of the process, etc. This is dangerous because it involves giving an outsider access to a trusted system, and one that might be critical to other operations. Even if a whole-system memory dump is generated, say by physical access or a cold-boot attack, there is still going to be a lot of sensitive information there.

Another approach is emulation. The analyst uses a VM that turns all local syscalls into remote ones. This is connected to the compromised target host (or a clone of it), which runs a daemon to answer the API queries. The malware sample or relevant portions of it (such as the hash-and-decrypt routine) are run in the analyst’s VM, but the information the routine gathers comes directly from the compromised host. This allows the analyst to gather the relevant information while not having full access to the compromised machine.

In the next phase after this, malware authors add more anti-emulation checks to their payload decryption routine. They try to prevent this routine from being run in isolation, in an emulator. Eventually, you end up in a cat-and-mouse game of Core Wars on the live hardware. Malware keeps a closely-synchronized global heartbeat so that any attempt to dump and restart it on a single host corrupts its state irrecoverably. The payload, its triggers, and encryption keys evolve in coordination with the other hosts on the network and are tied closely to each machine’s identity.

Is this where we’re headed? I’m not sure, but I do know that software protection measures are becoming more, not less relevant.

SSL optimization and security talk

I gave a talk at Cal Poly on recently proposed changes to SSL. I covered False Start and Snap Start, both designed by Google engineer Adam Langley. Snap Start has been withdrawn, but there are some interesting design tradeoffs in these proposals that merit attention.

False Start provides a minor improvement over stock SSL, which takes two round trips in the initial handshake before application data can be sent. It saves one round trip on the initial handshake at the cost of sending data before checking for someone modifying the server’s handshake messages. It doesn’t provide any benefit on subsequent connections since the stock SSL resume protocol only takes one round trip also.

The False Start designers were aware of this risk, so they suggested the client whitelist ciphersuites for use with False Start. The assumption is that an attacker could get the client to provide ciphertext but wouldn’t be able to decrypt it if the encryption was secure. This is true most of the time, but is not sufficient.

The BEAST attack is a good example where ciphersuite whitelists are not enough. If a client used False Start as described in the standard, it couldn’t detect an attacker spoofing the server version in a downgrade attack. Thus, even if both the client and server supported TLS 1.1, which is secure against BEAST, False Start would have made the client insecure. Stock SSL would detect the version downgrade attack before sending any data and thus be safe.

The False Start standard (or at least implementations) could be modified to only allow False Start if the TLS version is 1.1 or higher. But this wouldn’t prevent downgrade attacks against TLS 1.1 or newer versions. You can’t both be proactively secure against the next protocol attack and use False Start. This may be a reasonable tradeoff, but it does make me a bit uncomfortable.

Snap Start removes both round trips for subsequent connections to the same server. This is one better than stock SSL session resumption. Additionally, it allows rekeying whereas session resumption uses the same shared key. The security cost is that Snap Start removes the server’s random contribution.

SSL is designed to fail safe. For example, neither party solely determines the nonce. Instead, the nonce is derived from both client and server randomness. This way, poor PRNG seeding by one of the participants doesn’t affect the final output.

Snap Start lets the client determine the entire nonce, and the server is expected to check it against a cache to prevent replay. There are measures to limit the size of the cache, but a cache can’t tell you how good the entropy is. Therefore, the nonce may be unique but still predictable. Is this a problem? Probably not, but I haven’t analyzed how a predictable nonce affects all the various operating modes of SSL (e.g., ECDH, client cert auth, SRP auth, etc.)

The key insight between both of these proposed changes to SSL is that latency is an important issue to SSL adoption, even with session resumption being built in from the beginning. Also, Google is willing to shift the responsibility for SSL security towards the client in order to save on latency. This makes sense when you own a client and your security deployment model is to ship frequent client updates. It’s less clear that this tradeoff is worth it for SSL applications besides HTTP or other security models.

I appreciate the work people like Adam have been doing to improve SSL performance and security. Obviously, unprotected HTTP is worse than some reductions in SSL security. However, careful study is needed for the many users of these kinds of protocol changes before their full impact is known. I remain cautious about adopting them.

More on the evolution of password security

Last time, we covered three factors that affect actual security of a password:

  1. Entropy — How many possibilities does the attacker need to consider?
  2. Guess rate — How quickly can the attacker try guesses, often determined by vantage point.
  3. Responses — What can the admin do about guessing attempts?

There’s another factor that will soon come into play, if it hasn’t already — the ongoing exposure of actual passwords as more sites are compromised. We’ve seen the simplest form of this when password reuse on an unimportant account leads to elevated access of a more important one. But that’s only the tip of the iceberg.

With massive compromises of plaintext passwords, attackers now have a growing source of wordlists derived from actual usage. Not only can you add the most common passwords to a wordlist, but you can even sort them in decreasing order of frequency. An astute attacker could even apply machine learning techniques like clustering and classification to determine which other words are missing. This could be used to identify popular memes (such as Korean pop stars), and lead to new words that are likely to be used in the future.

Hashed passwords posted after compromises are increasing attacker knowledge as well. Sure, your password hasn’t immediately been exposed but it remains available to anyone with the right wordlist or enough computing power, forever. As more of these are cracked, the global picture gets clearer, and you may be vulnerable to a targeted attack long after the original site is gone.

At a higher level, not only are compromised passwords useful in identifying missing words within a group, but they’re also useful in identifying the templates people use to construct passwords. After a compromise, not only are your password and close variants now vulnerable, but also people using the same scheme to choose their passwords. For example, automated analysis could determine that more users put the site name after than before the base word. Or the number 4 is more common as the first numeric value, but only with English speakers.

All of these factors mean that attackers face less entropy as more passwords are revealed. Site compromises not only reveal passwords themselves, but the thinking of the users behind the passwords. Trends in word choice give a more optimal order for cracking. Higher-level templates used to generate passwords are also revealed. Even your joke passwords on useless sites reveal something of your thought patterns.

The only answer may be to take password selection out of the hands of users. Truly random but memorable passwords don’t reveal anything beyond the password itself. And where possible, passwords can be avoided completely. For example, tokens or out-of-band communication can often be used for authentication. Since most devices are connected, such tokens can be shared between paired devices.

All that’s certain is that attackers will be winning the password game for years to come, and there are still many rich patterns to be mined from previous compromises.

On the evolving security of password schemes

Passwords have been around for millenia. The oldest use was to prove to sentries whether you were friend or foe. (I use “password” in this article to include a wide variety of schemes, including passphrases and numerical PINs).

Choosing a proper password has always required considering the attacker’s capabilities and limitations. There are several factors at play:

  1. Entropy — How many possibilities does the attacker need to consider? It may be lower than you’d think since it only depends on how little an attacker knows about you. If she has retrieved paper from your shredder, there may only be a few possibilities to try, even though the password itself is complex and impossible for someone else to guess.
  2. Guess rate — How quickly can the attacker try guesses? This is often determined by the attacker’s vantage point. If they have hashes from your server, an offline attack is many times faster than an online attack. It can also be limited by the attacker’s own hardware and by the hashing algorithm.
  3. Responses — What can you do about guessing? Can you disable a user’s account if there are too many bad attempts? Require secondary authentication? Can you shut down the entire system?

Unix systems in the 1980’s and 90’s limited passwords to 8 characters because of the the underlying DES cipher key size. Because this is too short to use multiple words, most recommendations from that time tried to maximize the entropy of every character while remaining memorable. A common suggestion was to take the first letter of each word in a phrase and mix in some punctuation and numerals. This kind of scheme persists to this day, with many websites enforcing a minimum (and sometimes maximum) length and the use of an uppercase letter or numeral.

Password cracking programs such as Crack (Unix) and Cracker Jack (DOS) targeted this scheme. To mirror user behavior, they would take a dictionary (wordlist) and append numerals or change case. A useful strategy would be to start with a common wordlist and add in local terms such as sports teams or city names. After a few passwords were cracked, you could identify patterns (such as user nationality or college major) and add similar terms to your set. But as long as the user didn’t use too short of a password or an actual word or close variant as the base string, they would usually be secure against Crack.

With the advent of the FreeBSD-MD5 scheme in the early 90’s, passwords could now be arbitrarily long. This brought login systems in line with PGP, which had supported long passwords for a while. The recommended scheme then changed to “use a difficult-to-guess passphrase.” However, not many concrete recommendations were made for what makes a passphrase difficult enough.

Many users thought that just having any passphrase was difficult enough. Who could guess all the letters and spaces among multiple words? While this might have been true if attackers stuck to Cracker Jack, it ignores the fact that attackers can change strategies. Each word can be treated like a single character as before. As long as the words were in a dictionary, multi-word passphrases might have less entropy than a password constructed the old way. Newer tools like John the Ripper help target passphrases.

In choosing a password, consider the entropy for multiple attacker vantage points. How much advantage would a co-worker have over a random stranger? Do they realize you like good Scotch and might use those names in your passphrase? Know you like Will Farrell movies and might use a quote from one? A good passphrase is one where even your spouse would not have an advantage over a stranger.

Additional entropy can be gained by varying it. Misspell or make up words, Dr. Seuss style (but don’t use words from his books!) Ever heard of a “omliyeti”? Me neither, but it might be memorable. Don’t capitalize the first word or put the punctuation (if any) at the end. Put spaces in the middle of words but run the beginning/ends together.

Admins can suggest schemes to help users pick good passwords, and they can attempt to crack their choices to establish password strength. But a user might still pick a low-entropy password that happens to pass this check. Fortunately, the second two factors above (guess rate and responses) are independent of entropy yet still have a big impact on actual password security.

The bcrypt and scrypt password hashing algorithms have greatly slowed the attacker’s guess rate. They use hash functions that are intentionally slow (and in the case of scrypt, memory intensive). More importantly, they have a tunable difficulty parameter that allows the admin to keep pace with Moore’s Law.

Responses can be very important as well. PINs can be numeric and short because access is usually limited to online guessing with lockout after a few tries. One approach I’ve used before is to seed the password file with fake accounts that have easier passwords than the rest (but still hard enough to prevent online guessing). If anyone logs in to them, we know the password file has been retrieved and someone is cracking it.

Another response would be to require secondary authentication. Google does this with their text message authentication. Duo Security provides a phone app. This can be required all the time or activated when the user logs in from a new IP address or doesn’t have the prerequisite cookie.

Password security is a difficult problem, especially with a varied user base. However, most admins focus too much on increasing entropy of user choices and not enough on decreasing the attacker’s guess rate and implementing responses to limit their access when they do get a hit.