More certs may indicate less security

In my last post, I mentioned how warning users when a previously-seen cert changes may generate false positives for some sites. If a website has a multiple servers with different certs, the browser may often generate spurious errors for that site. But could this be a symptom of a genuine security problem?

Citibank appears to have one certificate per server. You can verify this yourself by going to their website and multiple times, clearing your browser each time. Clicking on the SSL icon to the left of the URL will show a different cert.

Here are the first 4 bytes of  three serial numbers of certs observed at Citibank:

  • 43:8e:67:66
  • 61:22:d4:81
  • 3e:f4:5b:7c

The Citibank certs are all identical except for a few fields. As you would expect, the domain name (CN) field is identical for each. The organizational unit (OU) differs (e.g., “olb-usmtprweb3” versus “…web1”), but this field is not interpreted by browsers and is more of a convenience. The web server’s public key is different in each cert. And, of course, the serial number and signature fields also differ, as they should for all certs.

On the other hand, Wells Fargo appears to have only one cert. This cert (serial 41:c5:cd:90) is the same even after accessing their site via a proxy to ensure some load-balancing magic isn’t getting in the way. It’s easy to ignore this difference, but there might be something else going on.

Protecting the web server’s private key is one of the most important operational security duties. If it is discovered, all past and present encrypted sessions are compromised. (Yes, I know about DHE but it’s not widely used). After cleaning up the mess, the organization needs to get a new certificate and revoke the old one. This is no easy task as CRLs and OCSP both have their downsides.

One key question to ask an opsec department is “have you ever done a live cert revocation?” It’s one of those things that has to be experienced to be understood. In the recent Comodo fiasco, leaf cert revocations were embedded in browser software updates because the existing revocation mechanisms weren’t reliable enough.

Since web servers run commodity operating systems, most big sites use a hardware security module (HSM) to protect the private key. This is a dedicated box with some physical tamper resistance that is optimized for doing private key operations. By limiting the API to the server, HSMs can be hardened to prevent compromise, even if the server is hacked. The main downsides are that HSMs are expensive and may not live up to the original security guarantees as the API surface area grows.

Now, back to the two banks. Why would one have multiple certs but not the other? Certificates cost money, so if you’re offloading SSL to a single accelerator, there’s no reason to give it multiple certs. If each server has a dedicated HSM, you could use separate certs or just generate one and export it to all the others. You need to do this anyway for backup purposes.

This is just supposition, but one thing this could indicate is a different approach to securing the private key. Instead of generating one cert and private key, you create one per server and store it without an HSM. If a server gets compromised, you revoke the private key and move on. This might seem like a good idea to some since the cost of a cert must be lower than an HSM. However, the ineffectiveness of revocation today shows this to be a dangerous choice.

There may be other explanations for this. Perhaps Citi uses individual HSMs and Wells Fargo has a single SSL accelerator with plaintext HTTP in the backend. Perhaps they got a bargain on certs by buying in bulk. However, any time a system has more keys than necessary, it can lead to complicated key management. Or worse, it may indicate a weaker system design overall.

There’s no way to know the real story, but it’s good food for thought for anyone else who might be considering multiple certs as a substitute for strong private key protection. Cert revocation doesn’t currently work and should not be relied on.

14 thoughts on “More certs may indicate less security

  1. I hope there are not many sites that do this, because I hope for certificate pinning and/or a Perspectives-like mechanism to rise as viable ways to achieve stronger server authentication. There more Citi-like sites there are, the more operationally annoying that will be.

    I am not sure if we can draw much in the way of inferences about opsec from externally-visible certificate behavior (except for obvious things like Debian weak keys, et c.), although I agree that if they were hoping to use revocation, they are in big trouble.

    FWIW, the SSL Observatory shows that the many certs for {www,signon}.citibank.com are all from the same issuer (“CN=VeriSign Class 3 Extended Validation SSL SGC CA”). So at least it’s not the case that the certs are from different issuers, which would be doubly confusing…

    1. I agree this is just one possible hypothesis (among many) for this behavior. I wanted to use it as a jumping off point for emphasizing revocation is a clean-up, not security, mechanism. Obviously, we can’t be sure what the reasons are for the banks in question but I thought these points were worth making:

      • Revocation should not be relied on regularly. It’s an emergency option after everything else has failed
      • Fewer keys are almost always better than many keys
  2. DHE not widely used ? I noticed something is ‘about’ to change, a large marketshare of users will be using it soon.

    A default installation of Firefox 4 and Chrome 10, Opera 11 and default Apache 2.2.x with mod_ssl (atleast on Debian and Redhat) and Apache 1.x on OpenBSD with their patches prefer TLS_DHE_* (like TLS_DHE_RSA_WITH_AES_256_CBC_SHA) above a lot of others and chooses it every single time in my small tests.

    Apache has 60% of the market, Firefox has more than 25% of the market, in some regions, like Europe, it is the most used browser (about 37%). Soon a very large part of the Firefox 3.x users will upgrade to Firefox 4.x. Upgrades from Firefox 3.5 to Firefox 3.6 was almost a 100%. Supposedly Chrome has about 10% of the market, Opera has about 1%.

    The bas news:

    Google webservers do not choose TLS_DHE_*.

    Any combination with IE or IIS pretty much any version chooses things like TLS_RSA_WITH_RC4_128_SHA.

    I haven’t done any other test, this probably covers a large part of the market already.

  3. Loading the same private key into multiple HSM’s can be done pretty safely. Typically you’d do it by physically carrying keys on smart cards between HSM’s. No unencrypted private keys would ever be outside an HSM boundary, and only the HSM’s whose public keys were included in the ACL at the time the SSL private key is created, would be able to receive the SSL key.

    1. Right, and you can use secret splitting to give each officer only part of the key. I think people who say “never move a private key” haven’t really dealt with this. You have to backup/escrow your private key somewhere anyway.

  4. “You have to backup/escrow your private key somewhere anyway.”

    Maybe that is why they have more than 1. :-)

  5. IMHO, saying that cert revocation doesn’t work is a bit imprecise, which isn’t typical of you. It works poorly in some scenarios (namely SSL/TLS, which is the topic). There are other scenarios where it works reasonably well, depending on the size of the CRL and whether you can get OCSP to work.

    1. Revocation doesn’t work in today’s SSL CA world. In any system, revocation is a vital emergency feature but should not be relied on as the only security measure.

      The only system that has handled massive revocations is broadcast encryption for satellite TV. However, even they had to go through a rough time when revocation was ineffective until later designs got it right.

  6. I agree that this creates problems with systems that may try to add additional layers of security to existing SSL/TLS implementations by warning users of changed certs. However, without any details from Citibank, we can’t know why they are doing this – for all we know, they may have multiple HSMs, each with a different private key. Also, it is probably useful to include some idea of the general size comparison for each of the banks when performing a compare like this – some quick wikipedia’ing (not a cannon source, I know, but what I can do over lunch) shows that Wells Fargo has around 70M customers, and Citibank around 200M (almost 3x the Wells Fargo number, which is interesting given the 3 fold increase in certs, but not necessarily to be relied upon in and of itself). I can also say that my anecdotal experience is that I see far more Citigroup brands and customers outside of America than I do Wells Fargo.

    You also mention at the start that if a cert is compromised, it will leak data both forward and back (not withstanding any ephemeral key exchange methods, as you note). This, coupled with the size and geographical diversity of Citigroup, seems just as likely a reason to have multiple certificates to me – if one private key is compromised, you have only some of your traffic compromised.

    This is especially true if you have different geographical areas managing your private keys for the purpose of disaster resilience. Keys (protected by a _good_ HSM) are far more likely to be exposed during a moment of human interaction (eg loading of the keys, backup of the keys, etc), and so it may make sense to ensure that different geographic regions have different private keys. Indeed, I would suggest that having a single key in this case would complicate key management more, as you then need to establish a secure key delivery method between sites, and key distribution is definitely an active area for key compromise.

    Of course, all of this is conjecture. Certainly revocation should not be relied upon, but I don’t draw a line from multiple certs to an increased reliance on revocation, myself.

    1. Thanks for the detailed comment. A couple responses:

      1. Larger bank size does not require more certs. There are many more than 3 certs at Citibank.com.

      2. The effort required to compromise one of the private keys should be equivalent to compromising all of them. None of these certs is a higher privilege than the others. I think the partitioning argument has some validity but is a pretty small benefit.

      3. No evidence these servers are in different datacenters

      4. If you’re going to back up any of these private keys (escrow), you have the same key distribution problem as securely installing them in multiple HSMs. If you don’t back them up, then you’re back to relying on revocation when the HSM fails.

      I agree it’s impossible to say what the situation is here. However, I think managing multiple certs/private keys is harder than a single key.

      1. There are two obvious approaches, both mentioned further up:

        1) escrow the private key across multiple PIN-protected smart cards using secret sharing, issuing 1 card to each of several security officers. Initializing a new HSM requires getting two or three officers in a room with their smart cards, entering their PIN’s in a secured console. This is how we did it where I used to work. There isn’t TOO much that can go wrong with this, short of multiple officers colluding and carrying out fancy technical attacks against the cards.

        2) Load all HSM’s ahead of time: if you run ten servers, buy 12 HSM’s and generate a public/private keypair on each one (K1,K2…K12). Export all 12 public keys onto a smart card; the private keys never leave the HSM’s. On one HSM, generate your SSL secret key M, and at the time of creation, export M under the public keys K1…K12. That lets all 12 HSM’s import M. The ACL’s are set up so that after the initial creation, no HSM can ever export M. Install 10 of the HSM’s in your servers, and lock up the other two as spares. This requires some capacity planning since if you want to install more HSM’s during the 1-year certificate validity period, you’ll have to buy another cert, repeat the key creation operation, and deal with multiple keys. We talked about doing it that way but decided it was overkill.

        Key creation and loading in our organization was an infrequent enough operation that we ALWAYS screwed something up in the process. It seemed to me that a normal civilian business environment didn’t have much hope of consistently getting it right, due to the usual equipment and personnel changes that happen all the time. Only a military organization (or maybe a specialist operation like Verisign), that could develop very mechanical procedures and follow them every day, had much hope of working out all the kinks and keeping them worked out. But, as far as I know, we never spilled any keys.

      2. Thanks for the detailed comment. The problem with #2 is that someone else could add a 13th public key in the time between exporting the public keys and importing the shared key M. Or they could swap their own public key for one of the 12.

        Secret splitting the private key seems like a simpler way to get a backup of the private key as well as make importing it into a new HSM require positive action.

Comments are closed.