SSL optimization and security talk

I gave a talk at Cal Poly on recently proposed changes to SSL. I covered False Start and Snap Start, both designed by Google engineer Adam Langley. Snap Start has been withdrawn, but there are some interesting design tradeoffs in these proposals that merit attention.

False Start provides a minor improvement over stock SSL, which takes two round trips in the initial handshake before application data can be sent. It saves one round trip on the initial handshake at the cost of sending data before checking for someone modifying the server’s handshake messages. It doesn’t provide any benefit on subsequent connections since the stock SSL resume protocol only takes one round trip also.

The False Start designers were aware of this risk, so they suggested the client whitelist ciphersuites for use with False Start. The assumption is that an attacker could get the client to provide ciphertext but wouldn’t be able to decrypt it if the encryption was secure. This is true most of the time, but is not sufficient.

The BEAST attack is a good example where ciphersuite whitelists are not enough. If a client used False Start as described in the standard, it couldn’t detect an attacker spoofing the server version in a downgrade attack. Thus, even if both the client and server supported TLS 1.1, which is secure against BEAST, False Start would have made the client insecure. Stock SSL would detect the version downgrade attack before sending any data and thus be safe.

The False Start standard (or at least implementations) could be modified to only allow False Start if the TLS version is 1.1 or higher. But this wouldn’t prevent downgrade attacks against TLS 1.1 or newer versions. You can’t both be proactively secure against the next protocol attack and use False Start. This may be a reasonable tradeoff, but it does make me a bit uncomfortable.

Snap Start removes both round trips for subsequent connections to the same server. This is one better than stock SSL session resumption. Additionally, it allows rekeying whereas session resumption uses the same shared key. The security cost is that Snap Start removes the server’s random contribution.

SSL is designed to fail safe. For example, neither party solely determines the nonce. Instead, the nonce is derived from both client and server randomness. This way, poor PRNG seeding by one of the participants doesn’t affect the final output.

Snap Start lets the client determine the entire nonce, and the server is expected to check it against a cache to prevent replay. There are measures to limit the size of the cache, but a cache can’t tell you how good the entropy is. Therefore, the nonce may be unique but still predictable. Is this a problem? Probably not, but I haven’t analyzed how a predictable nonce affects all the various operating modes of SSL (e.g., ECDH, client cert auth, SRP auth, etc.)

The key insight between both of these proposed changes to SSL is that latency is an important issue to SSL adoption, even with session resumption being built in from the beginning. Also, Google is willing to shift the responsibility for SSL security towards the client in order to save on latency. This makes sense when you own a client and your security deployment model is to ship frequent client updates. It’s less clear that this tradeoff is worth it for SSL applications besides HTTP or other security models.

I appreciate the work people like Adam have been doing to improve SSL performance and security. Obviously, unprotected HTTP is worse than some reductions in SSL security. However, careful study is needed for the many users of these kinds of protocol changes before their full impact is known. I remain cautious about adopting them.

12 thoughts on “SSL optimization and security talk

  1. I agree that false start and snap start represent tradeoffs of a bit of security for better startup latency.

    I hadn’t heard that snap start was formally withdrawn, but I think it was probably for the better. Not that I know of anything broken about it, it just seemed like too fundamental of a change to try to make via the extension mechanism. It more or less created a new protocol.

    But false start is not nearly so complicated and ambitious. When all the parameters align perfectly, it simply relaxes an ordering constraint (otherwise it does nothing). It is true that this could allow an attacker to see some of the first blocks of AES CBC ciphertext, but I don’t see how that could enable the BEAST attack directly. Even if there were other channels by which the attacker could prompt the client to re-transmit the chosen plaintext, I don’t think the APIs would let him ‘false start’ a second block of it adaptively (after he knows the next IV).

    Of course, these are famous last words and it may prove that only a minor API change or a better imagination to enable a real attack. But in the absence of a fully defined weakness, we need to weigh our abstract concerns against the very real benefit of saving half a roundtrip of latency on every full handshake.

    1. This should be clear from the slides, but perhaps I said this verbally instead.

      Situation: secret cookie in client request, attacker wants to use BEAST
      Both client and server support TLS 1.1, which is immune to this attack

      With False Start: MITM downgrades server advertised version to TLS 1.0, client sends request with secret to server with vulnerable reused IV.

      Without False Start: MITM detected when Server Finished is validated by the client and thus, no client request sent.

      1. Yes False Start allows an attacker to downgrade the handshake to TLS 1.0 and see the client’s first bit of ciphertext output. But this is not sufficient to enable the BEAST attack because that requires an *adaptive* chosen plaintext capability. I.e., the attacker has to know in advance what IV will be used to encrypt his chosen plaintext. For that initial application data record, the IV would be that of the handshake data record used to send the immediately previous Finished message. The whole point of FS though is that these records would be combined into a single packet (with no opportunity for the attacker to adapt), but you’re right, I don’t see anywhere it requires that explicitly.

        In theory, one could construct a client app (like a web browser) that began a TLS handshake, took attacker-chosen plaintext over some other channel, and then sent that as app data to the server (a la FS), all before receiving and validating the server’s Finshed message. Such a client app could be vulnerable to BEAST but, wow, that would be one crazy implementation.

        Probably the FS doc should be updated to forbid this (as well as multiple batches of False Start messages).

        It would be great if you would join the IETF [TLS] WG mailing list and mention this. If not, I’d be happy to bring it up there.

      2. Sure, feel free to bring it up there. Thanks.

        Framing and timing of messages must be explicitly defined. It’s entirely possible that SSL messages will be fragmented in arbitrary ways by the lower-level transport, and the protocol still must remain secure.

        I remain skeptical that False Start, simple though it is, is worth the potential security risk.

  2. BEAST is NOT an MITM in the HTTPS connection. BEAST merely stops bits moving in that connection after the server’s Finished is relayed to the client. BEAST uses a plain HTTP connection to drive the adaptive plaintext to the client. FalseStart merely makes it so the real server need not even see anything at all. But it hardly helps the client that the server saw a handshake. The server can’t really do anything with what looks like a dropped connection — all it has is an IP address, and not even the real client’s but the BEAST’s.

    1. I don’t understand what you mean by saying this is not a MITM attack. I mean MITM in the most basic sense — client initiates a connection and the attacker in the middle interferes.

      Sure, the attacker doesn’t need to forward anything on to the server (pure impersonation is enough), but she definitely has to be in the path of the connection or control DNS.

      1. Well, I did qualify ‘MITM’ with “in the HTTPS connection”. Specifically BEAST is not trying to impersonate the server in the handshake. It is an MITM in a more general sense, of course. The question in this comment thread though is whether FalseStart makes the BEAST attack worse. I believe the answer is “not all that much, no”. Sure, the attacker now doesn’t need connectivity to the real server, but so what? It’s not like letting the real server see a handshake from the real client is somehow bad for BEAST. And BEAST may need to keep the user’s system talking by simply letting enough bits move that the user doesn’t give up anyways. Also, some users will need to login to get cookies worth stealing (most will already be logged in). Can BEAST reach that many more victims by not needing connectivity to the real server? Probably not enough to make a real difference.

        But maybe I’m missing something?

      2. False Start makes it worse because even if both the client and server support TLS 1.1/+, which are not vulnerable to BEAST, the client would still be vulnerable to a downgrade attack. It has nothing to do with “letting enough bits move” or “not needing connectivity”.

        Perhaps you could re-read the slides or the other comments above and re-ask your question.

  3. My guess is that the non-random nonce isn’t an issue, given that the whole mess goes into the PRF in order to derive keys. It’s pretty reasonable to model this thing as a random oracle (rather than an actual PRF, which is dubious). Hence all you really need are /distinct/ nonces. That’s my crypto-nerd answer anyway.

    PS The damn thing shouldn’t be called a PRF. It’s not necessarily safe to use a real PRG here, at least the way they’re using it in the various TLS protocols.

    1. I think you’re right also (assuming you’re talking about Snap Start here).

      However, there is definitely a loss of resilience for implementations since a slight mistake in the nonce-checking code on the server side, out-of-date shared cache for multiple servers in the same zone, etc. would leave the system vulnerable to replay. It’s like blacklisting bad chars in user input vs. building the system to handle any kind of length-counted array.

      I think it’s much safer to just generate a shared nonce with client/server PRNG inputs than to try to check for replay and keep systems synchronized. Again, it’s an argument from implementation complexity and not theoretical security.

Comments are closed.