安全に、かつセキュアにパスワードをハッシュする

この節では、なぜハッシュ関数を使ってパスワードを守るのかについての理由と、ハッシュ処理を効率的に行う方法について説明します。

なぜ、アプリケーションのユーザーが登録したパスワードをハッシュしなければならないのですか?
よく使われるハッシュ関数である md5 や sha1 は、なぜパスワードのハッシュに適していないのですか?
よく使われるハッシュ関数では不適切だというのなら、パスワードをどうやってハッシュすればいいのですか?
ソルトとは?
ソルトはどのように保存すればいいのですか？

なぜ、アプリケーションのユーザーが登録したパスワードをハッシュしなければならないのですか?

パスワードのハッシュは、最も基本的なセキュリティ要件のひとつです。ユーザーからパスワードを受け取るアプリケーションやサービスを設計するときには必ず考慮しなければなりません。ハッシュしなければ、パスワードを格納したデータストアが攻撃を受けたときにパスワードを盗まれてしまいます。それは即時にアプリケーションやサービスが乗っ取られることにつながるし、もしそのユーザーが他のサービスでも同じアカウント・同じパスワードを使っていればさらに被害が大きくなります。

ユーザーのパスワードにハッシュアルゴリズムを適用してからデータベースに格納しておくと、攻撃者が元のパスワードを知ることが難しくなります。とはいえ、パスワードのハッシュ結果との比較は可能です。

しかし、ここで注意すべき点は、パスワードのハッシュ処理はあくまでもデータストアへの不正アクセスからの保護にすぎず、アプリケーションやサービス自体に不正なコードを注入される攻撃からは守れないということです。

よく使われるハッシュ関数である md5() や sha1() は、なぜパスワードのハッシュに適していないのですか?

MD5 や SHA1 そして SHA256 といったハッシュアルゴリズムは、高速かつ効率的なハッシュ処理のために設計されたものです。最近のテクノロジーやハードウェア性能をもってすれば、これらのアルゴリズムの出力をブルートフォースで(力ずくで)調べて元の入力を得るのはたやすいことです。

最近のコンピュータではハッシュアルゴリズムを高速に逆算できるので、セキュリティ技術者の多くはこれらの関数をパスワードのハッシュに使わないよう強く推奨しています。

よく使われるハッシュ関数では不適切だというのなら、パスワードをどうやってハッシュすればいいのですか?

パスワードをハッシュするときに検討すべき重要な二点は、その計算量とソルトです。ハッシュアルゴリズムの計算コストが増えれば増えるほど、ブルートフォースによる出力の解析に時間を要するようになります。

PHP にはネイティブのパスワードハッシュ API が用意されており、これを使えばハッシュの計算やパスワードの検証を安全に行えます。

パスワードをハッシュするときのおすすめのアルゴリズムは Blowfish です。パスワードハッシュ API でも、このアルゴリズムをデフォルトで使っています。というのも、このアルゴリズムは MD5 や SHA1 と比較して計算コストが高いにもかかわらず、スケーラブルだからです。

crypt() 関数もパスワードのハッシュに使えますが、他のシステムとの相互運用性を保つために使うのがおすすめです。可能な限りネイティブのパスワードハッシュ API を使うようにしましょう。

ソルトとは?

暗号理論におけるソルトとは、ハッシュ処理の際に追加するデータのことです。事前に計算済みのハッシュとその元入力の対応表 (レインボーテーブル) で出力を解析される可能性を減らすために利用します。

端的に言うと、ソルトとはちょっとした追加データです。これをつけるだけで、ハッシュをクラックするのが劇的に難しくなります。事前に計算済みのハッシュとその元入力を大量にまとめた表が、オンラインで多数公開されています。ソルトを使えば、そのハッシュ値がこれらの表に含まれている可能性を大きく減らすことができます。

password_hash() は、ソルトを指定しなかった場合にはランダムなソルトを作ります。一般に、これがいちばんお手軽で安全なアプローチでしょう。

ソルトはどのように保存すればいいのですか？

password_hash() や crypt() を使った場合、戻り値であるパスワードハッシュの中にソルトが含まれています。このソルトは、そのままの形式でデータベースに格納する必要があります。というのも、利用したハッシュ関数の情報がそこに含まれており、それを直接 password_verify() や crypt() に渡せばパスワードの検証ができるからです。

警告

タイミング攻撃を避けるために、保存されたハッシュ結果を比較したり再ハッシュを行う代わりに常に password_verify() を使うべきです。

crypt() や password_hash() の戻り値の書式を次の図に示します。このように、使ったアルゴリズムや検証時に必要なソフトに関する情報もすべて含まれています。

password_hash や crypt が戻す値。
使ったアルゴリズム、そのアルゴリズムのオプション、使ったソルト、そしてハッシュしたパスワードが続く。

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 3 notes

down

147

alf dot henrik at ascdevel dot com ¶

11 years ago

I feel like I should comment some of the clams being posted as replies here.

For starters, speed IS an issue with MD5 in particular and also SHA1. I've written my own MD5 bruteforce application just for the fun of it, and using only my CPU I can easily check a hash against about 200mill. hash per second. The main reason for this speed is that you for most attempts can bypass 19 out of 64 steps in the algorithm. For longer input (> 16 characters) it won't apply, but I'm sure there's some ways around it.

If you search online you'll see people claiming to be able to check against billions of hashes per second using GPUs. I wouldn't be surprised if it's possible to reach 100 billion per second on a single computer alone these days, and it's only going to get worse. It would require a watt monster with 4 dual high-end GPUs or something, but still possible.

Here's why 100 billion per second is an issue:
Assume most passwords contain a selection of 96 characters. A password with 8 characters would then have 96^8 = 7,21389578984e+15 combinations.
With 100 billion per second it would then take 7,21389578984e+15 / 3600 = ~20 hours to figure out what it actually says. Keep in mind that you'll need to add the numbers for 1-7 characters as well. 20 hours is not a lot if you want to target a single user. 

So on essence:
There's a reason why newer hash algorithms are specifically designed not to be easily implemented on GPUs.

Oh, and I can see there's someone mentioning MD5 and rainbow tables. If you read the numbers here, I hope you realize how incredibly stupid and useless rainbow tables have become in terms of MD5. Unless the input to MD5 is really huge, you're just not going to be able to compete with GPUs here. By the time a storage media is able to produce far beyond 3TB/s, the CPUs and GPUs will have reached much higher speeds.

As for SHA1, my belief is that it's about a third slower than MD5. I can't verify this myself, but it seems to be the case judging the numbers presented for MD5 and SHA1. The issue with speeds is basically very much the same here as well.

The moral here:
Please do as told. Don't every use MD5 and SHA1 for hasing passwords ever again. We all know passwords aren't going to be that long for most people, and that's a major disadvantage. Adding long salts will help for sure, but unless you want to add some hundred bytes of salt, there's going to be fast bruteforce applications out there ready to reverse engineer your passwords or your users' passwords.

down

swardx at gmail dot com ¶

9 years ago

A great read..

https://nakedsecurity.sophos.com/2013/11/20/serious-security-how-to-store-your-users-passwords-safely/

Serious Security: How to store your users’ passwords safely

In summary, here is our minimum recommendation for safe storage of your users’ passwords:

    Use a strong random number generator to create a salt of 16 bytes or longer.
    Feed the salt and the password into the PBKDF2 algorithm.
    Use HMAC-SHA-256 as the core hash inside PBKDF2.
    Perform 20,000 iterations or more. (June 2016.)
    Take 32 bytes (256 bits) of output from PBKDF2 as the final password hash.
    Store the iteration count, the salt and the final hash in your password database.
    Increase your iteration count regularly to keep up with faster cracking tools.

Whatever you do, don’t try to knit your own password storage algorithm.

down

-2

tamas at microwizard dot com ¶

4 years ago

While I am reading the comments some old math lessons came into my mind and started thinking. Using constants in a mathematical algorythms do not change the complexity of the algorythm itself.

The reason of salting is to avoid using rainbow tables (sorry guys this is the only reason) because it speeds up (shortcuts) the "actual" processing power.
(((Longer stored hashes AND longer password increases complexity of cracking NOT adding salt ALONE.)))

PHP salting functions returns all the needed information for checking passwords, therfore this information should be treated as constant from farther point of view. It is also a target for rainbow tables (sure: for much-much larger ones).

What is the solution?
The solution is to store password hash and salt on different places.
The implementation is yours. Every two different places will be good enough.

Yes, it will make problems for hackers. He/she needs to understand your system. No speed up for password cracking will work for him/her without reimplementing your whole system.

This is my two cent.

＋add a note