SSL optimization and security talk

February 27, 2012February 24, 2012 ~ Nate Lawson ~ 12 Comments

I gave a talk at Cal Poly on recently proposed changes to SSL. I covered False Start and Snap Start, both designed by Google engineer Adam Langley. Snap Start has been withdrawn, but there are some interesting design tradeoffs in these proposals that merit attention.

False Start provides a minor improvement over stock SSL, which takes two round trips in the initial handshake before application data can be sent. It saves one round trip on the initial handshake at the cost of sending data before checking for someone modifying the server’s handshake messages. It doesn’t provide any benefit on subsequent connections since the stock SSL resume protocol only takes one round trip also.

The False Start designers were aware of this risk, so they suggested the client whitelist ciphersuites for use with False Start. The assumption is that an attacker could get the client to provide ciphertext but wouldn’t be able to decrypt it if the encryption was secure. This is true most of the time, but is not sufficient.

The BEAST attack is a good example where ciphersuite whitelists are not enough. If a client used False Start as described in the standard, it couldn’t detect an attacker spoofing the server version in a downgrade attack. Thus, even if both the client and server supported TLS 1.1, which is secure against BEAST, False Start would have made the client insecure. Stock SSL would detect the version downgrade attack before sending any data and thus be safe.

The False Start standard (or at least implementations) could be modified to only allow False Start if the TLS version is 1.1 or higher. But this wouldn’t prevent downgrade attacks against TLS 1.1 or newer versions. You can’t both be proactively secure against the next protocol attack and use False Start. This may be a reasonable tradeoff, but it does make me a bit uncomfortable.

Snap Start removes both round trips for subsequent connections to the same server. This is one better than stock SSL session resumption. Additionally, it allows rekeying whereas session resumption uses the same shared key. The security cost is that Snap Start removes the server’s random contribution.

SSL is designed to fail safe. For example, neither party solely determines the nonce. Instead, the nonce is derived from both client and server randomness. This way, poor PRNG seeding by one of the participants doesn’t affect the final output.

Snap Start lets the client determine the entire nonce, and the server is expected to check it against a cache to prevent replay. There are measures to limit the size of the cache, but a cache can’t tell you how good the entropy is. Therefore, the nonce may be unique but still predictable. Is this a problem? Probably not, but I haven’t analyzed how a predictable nonce affects all the various operating modes of SSL (e.g., ECDH, client cert auth, SRP auth, etc.)

The key insight between both of these proposed changes to SSL is that latency is an important issue to SSL adoption, even with session resumption being built in from the beginning. Also, Google is willing to shift the responsibility for SSL security towards the client in order to save on latency. This makes sense when you own a client and your security deployment model is to ship frequent client updates. It’s less clear that this tradeoff is worth it for SSL applications besides HTTP or other security models.

I appreciate the work people like Adam have been doing to improve SSL performance and security. Obviously, unprotected HTTP is worse than some reductions in SSL security. However, careful study is needed for the many users of these kinds of protocol changes before their full impact is known. I remain cautious about adopting them.

Why stream ciphers shouldn’t be used for hashing

January 31, 2012October 28, 2012 ~ Nate Lawson ~ 11 Comments

I recently saw a blog post that discussed using RC4 as an ad-hoc hash in order to show why CBC mode is better than ECB. While the author’s example is merely an attempt to create a graphic, it reminded me to explain why a stream cipher shouldn’t be used as as a cryptographic hash.

A stream cipher like RC4 only has one input (the key) and one output, a variable-length keystream. During initialization, the key is expanded and stored in an internal buffer. When the user wants to encrypt or decrypt (both are the same operation), the buffer is updated in some way and keystream bits are output. It’s up to the caller to take that keystream data and XOR it with the plaintext to get the ciphertext (or vice versa). Very simple, right? You just initialize the stream cipher’s state with a key and then turn the crank whenever you want keystream bits.

A cryptographic hash algorithm like SHA-1 also has one input (the data) and one output, the digest. A variable-length stream of input data is crunched in blocks, giving a final output digest that should be difficult to invert, among other properties.

At first glance, it seems that a stream cipher can be used as a cryptographic hash by setting the data to hash as the key, turning the crank, and using some of the keystream as the digest. The reasoning goes, “since it should be difficult to recover the original stream cipher key merely by seeing some of the keystream, the output is usable as a hash”. While this may sound reasonable, it is often wrong, leading to various security problems.

There are numerous, vital design distinctions between stream ciphers and hashes. First, a stream cipher is designed to output an extremely long keystream sequence while a hash digest is a relatively small, fixed-length output. There are design differences that arise from expanding a key vs. compressing input. Also, resistance against a chosen input attack is a requirement for a cryptographic hash, while it may not have been considered for a stream cipher. What could an attacker gain if they can choose the input keys? By definition, they already know the secret key in this case.

The RC4 weakness that led to WEP being broken was a related-key attack. Even though an attacker could not choose WEP keys, the RC4 key was the concatenation of a counter and the secret key. Thus, subsequent outputs of the keystream are derived from closely related input keys.

But to use RC4 for hashing, it would have to be resistant not only to related key attacks, but to a chosen key attack. In this case, the attacker can target weaknesses in your key schedule algorithm by maliciously choosing many keys versus merely knowing that some relation exists between unknown keys that the attacker can’t choose. While chosen-IV attacks are part of the consideration for stream ciphers, I haven’t heard of full chosen-key resistance being an important design criteria. (Please correct me if I’m out of date on this, especially with eStream).

In contrast, resistance to a chosen-input attack is the very definition of a cryptographic hash algorithm. This resistance comes at a performance cost. Turning a hash algorithm into a stream cipher can be done (say, an HMAC using a key and counter), but it’s slower than stream ciphers that were designed as such. Stream cipher designs are optimized for performance and are usually not focused on preventing chosen-key attacks. An interesting corrolary is that analyzing a stream cipher’s key scheduling algorithm as a hash function (e.g., collision resistance) is often a good way to understand its possible weaknesses.

To summarize, don’t use cryptographic primitives for non-standard purposes. There are often built-in assumptions based on the original intended application that could compromise your modified design.

More on the evolution of password security

January 18, 2012October 28, 2012 ~ Nate Lawson ~ 2 Comments

Last time, we covered three factors that affect actual security of a password:

Entropy — How many possibilities does the attacker need to consider?
Guess rate — How quickly can the attacker try guesses, often determined by vantage point.
Responses — What can the admin do about guessing attempts?

There’s another factor that will soon come into play, if it hasn’t already — the ongoing exposure of actual passwords as more sites are compromised. We’ve seen the simplest form of this when password reuse on an unimportant account leads to elevated access of a more important one. But that’s only the tip of the iceberg.

With massive compromises of plaintext passwords, attackers now have a growing source of wordlists derived from actual usage. Not only can you add the most common passwords to a wordlist, but you can even sort them in decreasing order of frequency. An astute attacker could even apply machine learning techniques like clustering and classification to determine which other words are missing. This could be used to identify popular memes (such as Korean pop stars), and lead to new words that are likely to be used in the future.

Hashed passwords posted after compromises are increasing attacker knowledge as well. Sure, your password hasn’t immediately been exposed but it remains available to anyone with the right wordlist or enough computing power, forever. As more of these are cracked, the global picture gets clearer, and you may be vulnerable to a targeted attack long after the original site is gone.

At a higher level, not only are compromised passwords useful in identifying missing words within a group, but they’re also useful in identifying the templates people use to construct passwords. After a compromise, not only are your password and close variants now vulnerable, but also people using the same scheme to choose their passwords. For example, automated analysis could determine that more users put the site name after than before the base word. Or the number 4 is more common as the first numeric value, but only with English speakers.

All of these factors mean that attackers face less entropy as more passwords are revealed. Site compromises not only reveal passwords themselves, but the thinking of the users behind the passwords. Trends in word choice give a more optimal order for cracking. Higher-level templates used to generate passwords are also revealed. Even your joke passwords on useless sites reveal something of your thought patterns.

The only answer may be to take password selection out of the hands of users. Truly random but memorable passwords don’t reveal anything beyond the password itself. And where possible, passwords can be avoided completely. For example, tokens or out-of-band communication can often be used for authentication. Since most devices are connected, such tokens can be shared between paired devices.

All that’s certain is that attackers will be winning the password game for years to come, and there are still many rich patterns to be mined from previous compromises.

On the evolving security of password schemes

January 10, 2012October 28, 2012 ~ Nate Lawson ~ 14 Comments

Passwords have been around for millenia. The oldest use was to prove to sentries whether you were friend or foe. (I use “password” in this article to include a wide variety of schemes, including passphrases and numerical PINs).

Choosing a proper password has always required considering the attacker’s capabilities and limitations. There are several factors at play:

Entropy — How many possibilities does the attacker need to consider? It may be lower than you’d think since it only depends on how little an attacker knows about you. If she has retrieved paper from your shredder, there may only be a few possibilities to try, even though the password itself is complex and impossible for someone else to guess.
Guess rate — How quickly can the attacker try guesses? This is often determined by the attacker’s vantage point. If they have hashes from your server, an offline attack is many times faster than an online attack. It can also be limited by the attacker’s own hardware and by the hashing algorithm.
Responses — What can you do about guessing? Can you disable a user’s account if there are too many bad attempts? Require secondary authentication? Can you shut down the entire system?

Unix systems in the 1980’s and 90’s limited passwords to 8 characters because of the the underlying DES cipher key size. Because this is too short to use multiple words, most recommendations from that time tried to maximize the entropy of every character while remaining memorable. A common suggestion was to take the first letter of each word in a phrase and mix in some punctuation and numerals. This kind of scheme persists to this day, with many websites enforcing a minimum (and sometimes maximum) length and the use of an uppercase letter or numeral.

Password cracking programs such as Crack (Unix) and Cracker Jack (DOS) targeted this scheme. To mirror user behavior, they would take a dictionary (wordlist) and append numerals or change case. A useful strategy would be to start with a common wordlist and add in local terms such as sports teams or city names. After a few passwords were cracked, you could identify patterns (such as user nationality or college major) and add similar terms to your set. But as long as the user didn’t use too short of a password or an actual word or close variant as the base string, they would usually be secure against Crack.

With the advent of the FreeBSD-MD5 scheme in the early 90’s, passwords could now be arbitrarily long. This brought login systems in line with PGP, which had supported long passwords for a while. The recommended scheme then changed to “use a difficult-to-guess passphrase.” However, not many concrete recommendations were made for what makes a passphrase difficult enough.

Many users thought that just having any passphrase was difficult enough. Who could guess all the letters and spaces among multiple words? While this might have been true if attackers stuck to Cracker Jack, it ignores the fact that attackers can change strategies. Each word can be treated like a single character as before. As long as the words were in a dictionary, multi-word passphrases might have less entropy than a password constructed the old way. Newer tools like John the Ripper help target passphrases.

In choosing a password, consider the entropy for multiple attacker vantage points. How much advantage would a co-worker have over a random stranger? Do they realize you like good Scotch and might use those names in your passphrase? Know you like Will Farrell movies and might use a quote from one? A good passphrase is one where even your spouse would not have an advantage over a stranger.

Additional entropy can be gained by varying it. Misspell or make up words, Dr. Seuss style (but don’t use words from his books!) Ever heard of a “omliyeti”? Me neither, but it might be memorable. Don’t capitalize the first word or put the punctuation (if any) at the end. Put spaces in the middle of words but run the beginning/ends together.

Admins can suggest schemes to help users pick good passwords, and they can attempt to crack their choices to establish password strength. But a user might still pick a low-entropy password that happens to pass this check. Fortunately, the second two factors above (guess rate and responses) are independent of entropy yet still have a big impact on actual password security.

The bcrypt and scrypt password hashing algorithms have greatly slowed the attacker’s guess rate. They use hash functions that are intentionally slow (and in the case of scrypt, memory intensive). More importantly, they have a tunable difficulty parameter that allows the admin to keep pace with Moore’s Law.

Responses can be very important as well. PINs can be numeric and short because access is usually limited to online guessing with lockout after a few tries. One approach I’ve used before is to seed the password file with fake accounts that have easier passwords than the rest (but still hard enough to prevent online guessing). If anyone logs in to them, we know the password file has been retrieved and someone is cracking it.

Another response would be to require secondary authentication. Google does this with their text message authentication. Duo Security provides a phone app. This can be required all the time or activated when the user logs in from a new IP address or doesn’t have the prerequisite cookie.

Password security is a difficult problem, especially with a varied user base. However, most admins focus too much on increasing entropy of user choices and not enough on decreasing the attacker’s guess rate and implementing responses to limit their access when they do get a hit.

Mixed voltage interfacing for design or hacking

January 6, 2012October 28, 2012 ~ Nate Lawson

Modern digital systems involve a wide array of voltages. Instead of just the classic 5V TTL, they now use components and busses ranging from 3.3V down to 1.0V. Interfacing with these systems is tricky, especially when you have multiple power sources, capacitive loads, and inrush current from devices being powered on. It’s important to understand the internal design of all drivers, both in your board and the target, to be sure your final product is cheap but reliable.

When doing a security audit of an embedded device, interfacing is one of the main tasks. You have to figure out what signals you want to tap or manipulate. You pick a speed range and signal characteristics. For example, you typically want to sample at 4x the target data rate to ensure a clean signal or barring that, lock onto a clock to synchronize at 1x. For active attacks such as glitching, you often have to do some analog design for the pulse generator and trigger it from your digital logic. Most importantly, you have to make sure your monitoring board does not disrupt the system. The Xbox tap board (pdf) built by bunnie is a great case study for this, as well as his recent NeTV device.

Building a mass market board is different than a quick hack for a one-off review, and cost becomes a big concern. Sure, you can use an external level shifter or transceiver chip. However, these come with a lot of trade-offs. They add latency to switching times. They often require dual power supplies, which may not be available for your particular application. They usually require extra control pins (e.g., to select signal direction). They force you to switch a group of pins all at once, not individually. Most importantly, they add cost and take up board space. With a little more thought, you can often come up with a simpler solution.

There was a great article in Circuit Cellar on mixed-voltage digital interfacing. It goes into a lot of the different design approaches, from current-limiting resistors all the way up to BJTs and MOSFETs. The first step is to understand the internals of I/O pins on most digital devices. Below is a diagram I reproduced from the article.

CMOS I/O Pin design — CMOS I/O pin, showing internal ESD diodes

Whether you are using a microcontroller or logic chip, usually the I/O pins have a similar design. The input to the gate is protected by diodes that will conduct current to ground if the input voltage goes more negative than the GND pin or to Vcc if the input is more positive than Vcc. These diodes are normally used to protect against static discharge, but can also be a useful part of your design if you are careful.

The current limiting resistor can be calculated by figuring out the maximum and minimum voltages your input will see and the maximum current flow you want into your device. You also have to take into account the diode’s voltage drop. Here’s an example calculation:

Max input level: 6V
Vcc: 5V
Diode drop: 0.7V (typical)
Max current: 100 microamps

R = V / I = (6 – 5 – 0.7) / 0.0001 = 3K ohms

While microcontrollers and 74xx logic devices are often tolerant of some moderate current, FPGAs are extremely sensitive. If you’re using an FPGA, use a level shifter chip or you’ll be sorry. Also, most devices are more sensitive to negative voltage than positive. Even if you’re within the specs for max current, a negative voltage may quickly lead to latch-up.

Level shifters usually take in dual voltages and safely isolate both sides. Some are one-way, which can be used as line drivers or line receivers. Others are bidirectional transceivers, and the direction is selected by an extra pin. If you can’t afford the extra pins, you can often combine an open-collector transmitter with a line receiver. However, if you happen to drive the lines by transmitting at the same time as the other side, you can fry your chips.

To summarize, mixed voltage interfacing is a skill you’ll need when building your own devices or hacking existing ones. You can get by with just a current-limiting resistor in many cases, but be sure you understand the I/O pin design to avoid costly failures, especially in the field over time. For more assurance or with sensitive parts like FPGAs, you’ll have to use a level shifter.

The lost Van Jacobson paper that could save the Internet

December 30, 2011October 28, 2012 ~ Nate Lawson ~ 7 Comments

One of my heroes has always been Van Jacobson. His 1988 paper on solving TCP congestion is an enjoyable read, with cross-discipline appeal. The history of all this is fascinating, such as congestion control’s roots in hydrodynamics theory. (If you want to kill an afternoon, you can read my collection of the history of Internet working in the 80’s and 90’s. I especially like the notes on tuning Sun’s IP stack with hand-coded assembly.)

Since the old days, the IETF has taken over and our congestion problems are more or less solved, right? Well, not exactly. There’s a new congestion storm brewing with our endpoints that is largely the impetus for the network neutrality dispute.

Back in 2008, I wrote some articles about how Random Early Detection (RED) would be more effective than deep packet inspection in solving the congestion apparently caused by Bittorrent. At the time, some ISPs were terminating Bittorrent uploads, supposedly in order to manage their bandwidth. I thought network admins ignored RED because they were control freaks, and deep packet inspection gives you a lot of control over user behavior. But a lost Van Jacobson paper with a diagram of a toilet might be the key to the new congestion problem.

Jim Gettys of Bell Labs has been blogging for about a year on a phenomenon known as “bufferbloat“. This refers to the long queues created by the large buffers of routers, firewalls, cable modems, and other intermediate gateways. Because of Moore’s Law making RAM cheaper and lack of queue management, packets are queued for a long time during congestion instead of being dropped quickly. This misleads TCP congestion control and leads to even more congestion.

Back when RAM was expensive and networks were slow, packets were dropped immediately when congestion was encountered. This created a responsive control system. The transmitter could be sure a packet had been dropped if it didn’t get an ACK within a couple standard deviations of the average round-trip time.

Think of such a network as a stiff spring. As the transmitter “pushed” on one end of the spring, the response force was quickly “felt”, and the sender could back off when the network bandwidth was fully allocated.

Now, increase the bandwidth and intermediate router buffer sizes but maintain the same control system. More bandwidth means that it is normal to have many packets in flight (increased window size). Larger buffers mean more of those packets can be delayed without being dropped. If they are dropped, it happens long after the first congestion actually occurred and the buffer started filling up. Multiply this effect by each hop in the route to the destination.

This gives a control system more like a set of loose springs with gaps in the middle. The transmitter increases the window size until congestion is encountered, probing the available bandwidth. Instead of the first excess packet being dropped, it gets queued somewhere. This happens to many of the packets, until the intermediate buffer is full. Finally, a packet gets dropped but it’s too late — the sender has exceeded the network capacity by the available bandwidth plus the combined sizes of one or more of the intermediate buffers.

Network equipment manufacturers make this worse through a cycle of escalation. When a fast network meets a slower one, there has to be congestion. For example, a wireless router typically offers 50-100 Mbps speeds but is connected to a 5-10 Mbps Internet connection. If the manufacturer provides larger buffers, bursty traffic can be absorbed without packet loss, at least for a little while. But all packets experience a higher latency during this period of congestion, and the delay between transmission and drop grows, making the sender oscillate between over and under utilization.

The congestion problem was solved long ago by RED. When a router starts to experience congestion, it immediately applies an algorithm to fairly drop packets from the queue, weighted by each sender’s portion of bandwidth used. For example, with a simple random algorithm, a sender who is transmitting 50% of the total bandwidth is twice as likely to be dropped as someone using 25%.

Besides dropping packets, the router can also set an explicit congestion notification (ECN) bit on a packet. This communicates a warning to the sender that future packets will be dropped if it keeps increasing the window size. This is better than just dropping the packet since it avoids discarding useful data that the packet is carrying.

It turns out that RED is not enabled on many Internet routers. Jim wrote a fascinating post why. In short, ISPs avoided deploying RED due to some bugs in the original paper and the requirement for manually tuning its parameters. ISPs don’t want to do that and haven’t. But years ago, Van Jacobson had begun to write a paper on how to fix RED.

The lost paper was never published. One roadblock was that the diagram of a toilet offended a reviewer. Also, Van changed jobs and never got around to properly finishing it. He lost the draft and the FrameMaker software for editing it. But recently, the original draft was found and converted into a usable format.

Much remains to be done. This is truly a hard problem. Jim Gettys and others have been building tools to analyze bufferbloat and writing new articles. They’re trying to raise visibility of this issue and come up with a new variant of RED that can be widely deployed. If you’re interested in helping, download the tools or check out Netalyzr.

There’s no single correct solution to eliminating bufferbloat, but I’m hoping a self-tuning algorithm based on RED can be widely deployed in the coming years.