Last time, we covered three factors that affect actual security of a password:
- Entropy — How many possibilities does the attacker need to consider?
- Guess rate — How quickly can the attacker try guesses, often determined by vantage point.
- Responses — What can the admin do about guessing attempts?
There’s another factor that will soon come into play, if it hasn’t already — the ongoing exposure of actual passwords as more sites are compromised. We’ve seen the simplest form of this when password reuse on an unimportant account leads to elevated access of a more important one. But that’s only the tip of the iceberg.
With massive compromises of plaintext passwords, attackers now have a growing source of wordlists derived from actual usage. Not only can you add the most common passwords to a wordlist, but you can even sort them in decreasing order of frequency. An astute attacker could even apply machine learning techniques like clustering and classification to determine which other words are missing. This could be used to identify popular memes (such as Korean pop stars), and lead to new words that are likely to be used in the future.
Hashed passwords posted after compromises are increasing attacker knowledge as well. Sure, your password hasn’t immediately been exposed but it remains available to anyone with the right wordlist or enough computing power, forever. As more of these are cracked, the global picture gets clearer, and you may be vulnerable to a targeted attack long after the original site is gone.
At a higher level, not only are compromised passwords useful in identifying missing words within a group, but they’re also useful in identifying the templates people use to construct passwords. After a compromise, not only are your password and close variants now vulnerable, but also people using the same scheme to choose their passwords. For example, automated analysis could determine that more users put the site name after than before the base word. Or the number 4 is more common as the first numeric value, but only with English speakers.
All of these factors mean that attackers face less entropy as more passwords are revealed. Site compromises not only reveal passwords themselves, but the thinking of the users behind the passwords. Trends in word choice give a more optimal order for cracking. Higher-level templates used to generate passwords are also revealed. Even your joke passwords on useless sites reveal something of your thought patterns.
The only answer may be to take password selection out of the hands of users. Truly random but memorable passwords don’t reveal anything beyond the password itself. And where possible, passwords can be avoided completely. For example, tokens or out-of-band communication can often be used for authentication. Since most devices are connected, such tokens can be shared between paired devices.
All that’s certain is that attackers will be winning the password game for years to come, and there are still many rich patterns to be mined from previous compromises.
Interested in a rare engineering challenge with a growing startup? Root Labs is hiring experienced developers right now!
