What have we learned from the latest 30 years of password storage? We went from plaintext passwords in databases to hashed passwords, salted and even peppered passwords. Distributed computing required algorithms to adapt to new types of workload. So, what are the best practices, and what makes a truly resistant password hash against offline cracking?
Let’s review the basics of password hashing and the usage of the different algorithms used by the industry.
Don’t look back, password hashing is a one-way street
The process of hashing takes a text or byte stream as an input and processes it by a one-way function. The goal is for the result to be computationally difficult to convert back into the original input. These algorithms outputs are of fixed length, no matter the length of the input. The slightest change in the input will cause a completely different output because of the “avalanche effect”1https://en.wikipedia.org/wiki/Avalanche_effect. However, the same input will always lead to the same output since the result is deterministic.
Hashes are often used to store passwords. In fact, they are used to store a value associated with the password without having to manipulate it. This comes in handy in case of malicious database access: the attacker has the hashes but not the passwords themselves.
Hashing should be used each time that an application must store sensitive information that the software does not need to access in clear text during runtime. For example, hashing is not the solution for storing birthdates since one might need to read it and apply business logic based on it.
Hashes can also be used to check files integrity. One can compute the hash for a file, store it apart from it. These hashes are often seen on download pages as “checksum”2https://www.sentinelone.com/blog/what-is-hash-how-does-it-work/3https://security.stackexchange.com/questions/194600/checksum-vs-hash-differences-and-similarities. The user who will download the file will be able to compute and check that the hashes match. If this is not the case, it can mean that the file was corrupted or replaced during the transfer.
Collision: where multiple roads lead to Rome
For a hashing algorithm, a collision is the fact that two different inputs produce the same output (which is an undesirable property of a hashing algorithm). This happens because the output has a definite length and so has a limited number of possible results, but the input is not limited. The longer the result is, the lesser is the risk of collision, but there is always a risk of collision when a hashing algorithm are used.
The false friends of hashing: encoding and encryption
Encoding is made to be easily reverted, its main purpose is to make the transfer of the data easier. It differs from hashing since hashing is made so it cannot be reverted. Encoding is also confused as a form of encryption. It cannot be used for this purpose either.
Encryption is made to be reverted when providing the right key. Like hashing, it is used to secure sensitive data (on the fly or at rest), but the data remains accessible to the application for processing when needed. Encryption should be avoided to store users’ passwords because the software does not need to know the clear text password value of users anyway. Moreover, this means that the encryption key must be accessible by the application, consequently, the key and the passwords themselves may be exposed during execution.
A rainbow in a blink of an eye
Older hashing algorithms like SHA1, SHA2 and MD5 are not recommended anymore to hash passwords because they are quick and cheap to compute, so it is easier to do a brute force attack. In other words, if you know the output of the algorithm, you can test multiple inputs until you find one that matches the targeted output. For example, the 14 million passwords of the infamous rockyou.txt 4https://www.kaggle.com/wjburns/common-password-list-rockyoutxt wordlist were all computed in less than 3 seconds in SHA1 and in MD5 on a consumer laptop.
Another weak point of these algorithms is the existence of rainbow tables to instantly retrieve a matching input for a specific output. Rainbow tables are pre-computed tables associating inputs with output for a certain hashing algorithm, so if the output of the algorithm is in the rainbow table the input can be retrieved with only one access to the table5https://en.wikipedia.org/wiki/Rainbow_table6https://project-rainbowcrack.com/table.htm. These tables can take a lot of hard drive space based on pre-computed values, but they are also commonly accessible on the Internet through specialized websites.
All these hashes algorithms can be attacked using brute force, so, the goal is to improve them to make the offline attacks way more expensive (in terms of time and resources) for the attackers. To counter attacks based on rainbow tables and make brute force attacks harder in case of database leaks, software engineers added a grain of salt and a pinch of pepper to their passwords.
The salt is a small set of bytes or characters added to the input that is different for each user. It can be seen as an extension of the password to make it truly unique before hashing it. After the password and the salt are concatenated, they are hashed together. The salt is stored in clear along with the hash to reuse it when the password needs to be checked again. The salt itself must be long enough to be able to be unique for each user7https://crackstation.net/hashing-security.htm.
SHA1(“badpassword”) = 8a29aaf5687129c1d27b90578fc33ecc49d069dc
You can look up the hashed string on Google8https://www.google.com/search?q=8a29aaf5687129c1d27b90578fc33ecc49d069dc and you will instantly find the original password.
Let’s add a salt, “rand0mSa1t” for example:
SHA1(“rand0mSa1tbadpassword”) = 9838d4bf956d723acadc4cf2d03b1b3e22c4483d
You will not find this string on Google because no one can pre-compute salted hashes.
In the database, the salt is often in another column of separated by a sign: rand0mSa1t.9838d4bf956d723acadc4cf2d03b1b3e22c4483d
This technique makes rainbow tables inefficient but does not protect from brute-forcing since someone having access to the hash has often also access to the salt.
The pepper uses the same technique as the salt but the set of bytes or the characters added to the input are stored separately from the hashes. Another difference from the salt is that the pepper is usually the same for all the users. If the pepper is leaked or is found through brute force attack, the protection offered by it disappears. A brute force attack can be done when an attacker has access to the password hashes and knows one of the passwords of an account, he/she could use it to attempt to brute force the pepper9https://en.wikipedia.org/wiki/Pepper_(cryptography).
Make password hashing great again
The correct way to hash passwords nowadays is by using modern algorithms like BCRYPT, ARGON2 or PBKDF2 which are computationally slower and include salt by default10https://blog.cloudflare.com/how-developers-got-password-security-so-wrong/.
To compare them against the older algorithms, we computed only the first thousand passwords of the rockyou.txt wordlist. It took 5.5 seconds for ARGON2(i) and 4 minutes and 37 seconds for BCRYPT (using default work factor). The work factor should be carefully chosen to make an attack harder while keeping the computation time acceptable for legitimate users when they connect to the application.
A repository including the different scripts used to benchmark the hashing algorithms is available here.
The fact that these algorithms include salt by default prevents the creation of rainbow tables for them.
From file checksums performance to passwords safety
Multiple hashing algorithms currently exist and they all have a purpose that needs to be respected for them to be efficient. Some are made to perform file checksums and must not be used to secure passwords. For this purpose, prefer algorithms especially designed for this like BCRYPT.
Finally, do not attempt to create or customize a hashing algorithm. You may weaken the security provided by the original solution. If you need a good example of a misuse of BCRYPT resulting in the creation of weak hashes, don’t miss or article Password Hashing: Be careful about what you hash.