Why stream ciphers shouldn’t be used for hashing

I recently saw a blog post that discussed using RC4 as an ad-hoc hash in order to show why CBC mode is better than ECB. While the author’s example is merely an attempt to create a graphic, it reminded me to explain why a stream cipher shouldn’t be used as as a cryptographic hash.

A stream cipher like RC4 only has one input (the key) and one output, a variable-length keystream. During initialization, the key is expanded and stored in an internal buffer. When the user wants to encrypt or decrypt (both are the same operation), the buffer is updated in some way and keystream bits are output. It’s up to the caller to take that keystream data and XOR it with the plaintext to get the ciphertext (or vice versa). Very simple, right? You just initialize the stream cipher’s state with a key and then turn the crank whenever you want keystream bits.

A cryptographic hash algorithm like SHA-1 also has one input (the data) and one output, the digest. A variable-length stream of input data is crunched in blocks, giving a final output digest that should be difficult to invert, among other properties.

At first glance, it seems that a stream cipher can be used as a cryptographic hash by setting the data to hash as the key, turning the crank, and using some of the keystream as the digest. The reasoning goes, “since it should be difficult to recover the original stream cipher key merely by seeing some of the keystream, the output is usable as a hash”. While this may sound reasonable, it is often wrong, leading to various security problems.

There are numerous, vital design distinctions between stream ciphers and hashes. First, a stream cipher is designed to output an extremely long keystream sequence while a hash digest is a relatively small, fixed-length output. There are design differences that arise from expanding a key vs. compressing input. Also, resistance against a chosen input attack is a requirement for a cryptographic hash, while it may not have been considered for a stream cipher. What could an attacker gain if they can choose the input keys? By definition, they already know the secret key in this case.

The RC4 weakness that led to WEP being broken was a related-key attack. Even though an attacker could not choose WEP keys, the RC4 key was the concatenation of a counter and the secret key. Thus, subsequent outputs of the keystream are derived from closely related input keys.

But to use RC4 for hashing, it would have to be resistant not only to related key attacks, but to a chosen key attack. In this case, the attacker can target weaknesses in your key schedule algorithm by maliciously choosing many keys versus merely knowing that some relation exists between unknown keys that the attacker can’t choose. While chosen-IV attacks are part of the consideration for stream ciphers, I haven’t heard of full chosen-key resistance being an important design criteria. (Please correct me if I’m out of date on this, especially with eStream).

In contrast, resistance to a chosen-input attack is the very definition of a cryptographic hash algorithm. This resistance comes at a performance cost. Turning a hash algorithm into a stream cipher can be done (say, an HMAC using a key and counter), but it’s slower than stream ciphers that were designed as such. Stream cipher designs are optimized for performance and are usually not focused on preventing chosen-key attacks. An interesting corrolary is that analyzing a stream cipher’s key scheduling algorithm as a hash function (e.g., collision resistance) is often a good way to understand its possible weaknesses.

To summarize, don’t use cryptographic primitives for non-standard purposes. There are often built-in assumptions based on the original intended application that could compromise your modified design.

11 thoughts on “Why stream ciphers shouldn’t be used for hashing

  1. Maybe this should a discussion that should be taken offline, but as a person new to security, I would like to know the fundamental differences in stream ciphers and hashes when applied in real world examples. For example if a passwords were to be encrypted using RC4 as opposed to SHA-1 (not even sure it can be called encryption), it would result in what exactly, how would it be achieved, etc? I am probably blind to how this differs to what is offered by SSL as my understanding is that it is simply an exchange of keys which then hashed using an algorithm such as SHA. Be happy to discuss this via email especially considering I am a newbie to the security world.

    1. It would be called “password hashing”, not “encryption”. Also, that’s not how SSL works.

      Why don’t you read the archive of this blog (started in 2007)? You’ll come across a number of posts, including a presentation that’s an intro to SSL.

      1. Thanks Nate. Will have a look at it. Dumb question. What is the difference between hashing and encryption?

  2. How does this apply to bcrypt’s Eksblowfish-based approach for hashing passwords?

    This system uses a static well-known key (“OrpheanBeholderScryDoubt”) and uses the salt for the (thought to be extremely expensive) key setup, then runs the algorithm blowfish algorithm a shitload of times (alternating the salt and password for the key at each step). The output is the password’s “hash”.

    1. No, it uses that “Orphean” string as plaintext and the password/salt as the key.

      As stated in this post, it means Blowfish must be strong against related-key attacks since passwords may differ in only small amounts. But password hashing is quite different from regular hashing (e.g., plain SHA-1), which is what the post was really about.

  3. I think it may be worthwhile having a post on the differences between using hashes for securing passwords, and using hashes for securing data integrity / authenticity (as part of a signature). I see many people get this wrong, in terms of confusion around the differences between collision and pre-image recovery and how they relate to the two common uses for hashing.

  4. An other big problem is that RC4 is not resistant to first and second preimage attacks and absolutely not collision resistant if you use 256 bits key.

    It’s really easy to find 2 keys that generate the same first bytes of keystream.
    Even for “long” keystreams as an example :

    data 1 :

    and data 2 :

    generate the same first 248 bits of keystream :


    1. Absolutely.

      Chosen key resistance is much harder than related key resistance, and to use a cipher (stream or block) as a hash function, it needs to be strong against both. That’s the summary of this article.

    1. Block ciphers have generally had related key attack resistance as a design criteria since before AES was chosen. There’s no real reason why stream ciphers can’t have the same criteria, and modern ones might. That’s why I mentioned eStream.

      What I’m saying is that without this being an explicit design criteria, it’s unwise to use them in this way.

Comments are closed.