Last time, we covered three factors that affect actual security of a password:
- Entropy — How many possibilities does the attacker need to consider?
- Guess rate — How quickly can the attacker try guesses, often determined by vantage point.
- Responses — What can the admin do about guessing attempts?
There’s another factor that will soon come into play, if it hasn’t already — the ongoing exposure of actual passwords as more sites are compromised. We’ve seen the simplest form of this when password reuse on an unimportant account leads to elevated access of a more important one. But that’s only the tip of the iceberg.
With massive compromises of plaintext passwords, attackers now have a growing source of wordlists derived from actual usage. Not only can you add the most common passwords to a wordlist, but you can even sort them in decreasing order of frequency. An astute attacker could even apply machine learning techniques like clustering and classification to determine which other words are missing. This could be used to identify popular memes (such as Korean pop stars), and lead to new words that are likely to be used in the future.
Hashed passwords posted after compromises are increasing attacker knowledge as well. Sure, your password hasn’t immediately been exposed but it remains available to anyone with the right wordlist or enough computing power, forever. As more of these are cracked, the global picture gets clearer, and you may be vulnerable to a targeted attack long after the original site is gone.
At a higher level, not only are compromised passwords useful in identifying missing words within a group, but they’re also useful in identifying the templates people use to construct passwords. After a compromise, not only are your password and close variants now vulnerable, but also people using the same scheme to choose their passwords. For example, automated analysis could determine that more users put the site name after than before the base word. Or the number 4 is more common as the first numeric value, but only with English speakers.
All of these factors mean that attackers face less entropy as more passwords are revealed. Site compromises not only reveal passwords themselves, but the thinking of the users behind the passwords. Trends in word choice give a more optimal order for cracking. Higher-level templates used to generate passwords are also revealed. Even your joke passwords on useless sites reveal something of your thought patterns.
The only answer may be to take password selection out of the hands of users. Truly random but memorable passwords don’t reveal anything beyond the password itself. And where possible, passwords can be avoided completely. For example, tokens or out-of-band communication can often be used for authentication. Since most devices are connected, such tokens can be shared between paired devices.
All that’s certain is that attackers will be winning the password game for years to come, and there are still many rich patterns to be mined from previous compromises.
Hey Nate, I’ve been working the last couple of years on incorporating various machine learning techniques to improve offline password cracking tools. For example: http://goo.gl/VJDzh. I fully agree with you that the explosion of publicly available password lists has really changed the face of the password cracking scene. That being said, there’s a few nitpicks that I have.
First, entropy is probably the wrong word to use to describe the expected amount of work an attacker has to put into cracking a password. For a much longer, (and more boring), description why: http://goo.gl/NY7h9, but it really boils down to the fact that human selected passwords have an uneven probability distribution. Entropy doesn’t do a good job distinguishing this distribution in a way that’s useful for a defender. Aka it’s generally worse for a site to have a few users choose really bad passwords while some users choose really good passwords, vs a site where most users choose mediocre passwords. Both policies might result in the password groups having the same entropy value though.
Second, I’m actually kind of optimistic about the future of passwords. I agree with Hurley and van Oorschot’s paper, http://research.microsoft.com/apps/pubs/default.aspx?id=154077, that we’ll be stuck with human generated passwords for the near future and that’s not necessarily a bad thing. Quite honestly except in certain circumstances like file encryption, the importance of passwords being resistant to offline cracking attacks is much overstated. Online guess limiting, using blacklists to ban default and common passwords, and educating users about the dangers of password reuse can be quite effective. The thing is, most sites can’t force users to use truly random passwords, (well they can’t force the users and stay in business that is), and out-of-band communication often just isn’t feasible.
Thanks for the detailed comments.
I agree average entropy is not a good measure of password uncertainty. Most of my comments are directed towards realizing what factors can decrease the entropy of an individual password (or template scheme). There is definitely a different strategy to manage a large site and keep the total number of compromised passwords low.
I disagree that resistance to offline attacks is overstated. Instead, the large number of plaintext password sites is so great that they are the biggest current problem. As they are fixed, we’ll also have to eliminate trivial hashing (say, bare MD5).
The most important point I was trying to make with this series is that even conscientious users can be affected by security flaws outside their control and awareness.
For example, let’s say people adopt the XKCD password template scheme (4 “random” words chosen by the person). You and your university friends happen to “randomly” choose different words from your school’s anthem. One of your friends uses this password on a plaintext password site that gets compromised. An attacker tool does a web search on the phrase, and the song’s entire lyrics go into a .edu dictionary. Now your bcrypt-protected password on your bank account is also compromised, despite your password being quite reasonable in isolation.
This scenario isn’t so far off. I predict the next stage will be iterations on simple templates (word + digit + word). The next phase will be account-specific wordlists, keyed off your Facebook/LinkedIn profiles. Knowing this future will encourage people to be aware that even good password template schemes have a fixed lifetime.