The RFC Editor has announced a new document, RFC 6070, with test vectors for PKCS5 PBKDF2. The document grow out of my implementation of SCRAM for GNU SASL. During interop testing, more than one other implementation turned out to have mistakes in the PBKDF2 implementation. It didn’t help that there weren’t any stable test vectors for PBKDF2, so that we could do black-box testing of our PBKDF2 implementations against well-known and stable test vectors. Debugging this was time consuming. The document addresses this problem.
So what is PBKDF2?
Briefly, the PBKDF2 is the state of the art way to convert a password string into a cryptographic binary key. There are many ways to design such an algorithm. To understand why PBKDF2 is designed the way it is we need some basic understanding. The typical attack you would be concerned with is if someone compromises your authentication servers and steals the credential database. This is what happened with the recent Gawker attack, where passwords were apparently stored in clear text.
The first security improvement compared to storing passwords in clear text is to only store a hashed or encrypted form of the password. Hashing is preferable over encryption in this context, because with encryption it is possible to go back to the unencrypted form (if you know the key) whereas with a hash it is computationally infeasible to do so.
The downside of hashing is that it is impossible to be compatible with all kind of authentication systems: the hash of the password in the database needs to be computed using the same algorithm as the system will be using. Unfortunately there are many different schemes out there to do password hashing, including the old Unix crypt(3) algorithm and the Windows LM-hash. You normally never know in advance what kind of systems your hashed credential database needs to work for, so you have to pick some set of algorithms and hope for the best.
The next security improvements are iterations and salting.
Iterations means to run the hash function not once, but hundreds or even thousands of times, on the input data (e.g., password). The extra time is hardly noticeable for the user, but for an attacker who is testing an entire dictionary of common passwords against your hash, making the work cost thousand times higher quickly makes the attack vector computationally infeasible.
Salting means to include something other than the password in the input to the hash function, typically a random string. Without salting, a pre-computed rainbow table would provide a dictionary attack against all hashed password databases (one rainbow table per hashing algorithm). By using a sufficiently long and random salt, a rainbow table for one system is not applicable to another system. Since generating a rainbow table is time consuming, this adds another barrier to the attacker.
PBKDF2 combines these three concepts — hashing, iteration and salting — into one construct, and it is designed flexible enough so that it can work on any pseudo-random function. In practice, I’ve only ever seen HMAC-SHA1 be used.
Some final words on the publication behind RFC 6070: Out of my earlier publications, this was the document that (by a large margin) spent the least time in the IETF process. You see various milestones of the document in the RFC 6070 datatracker history page. The time in the publication process was from 2010-08-09 to 2011-01-06. One month was in the AUTH48 state, of which the initial 20 days were waiting for me to respond. Thank you to my sponsoring AD, Sean Turner, for making this possible!
How does it compare to SRP?
Pingback: Tweets that mention On Password Hashing and RFC 6070 « Simon Josefsson's blog -- Topsy.com
James: Interesting question! SRP solves a different problem, it is a client-server protocol. However, SRP also specify a hashed password form that can be stored at the server. The hashed password form can only be used for SRP. The data that is stored for SRP is computed like this:
As you can see, it does use hashing and salting. PBKDF2 is better because it uses HMAC-SHA1 instead of SHA1. (The SRP specification allows use of HMAC-SHA1, although it would be incompatible.) The exponentiation is similar to iteration, although I’m not sure which of PBKDF2 with reasonable iteration count and the SRP verifier-generation would take the longest to compute. Left as an exercise for the reader.
A significant practical disadvantage is that you need to re-generate the verifier if you change usernames, requiring access to the password. This was, incidentally, one of the serious problems DIGEST-MD5 had.