Is Argon2 actually better than Bcrypt?
The first rule of user passwords is never store them as plain text. The second rule is to use Bcrypt. Argon2 is a newer algorithm that gets recommended over Bcrypt, but the main principle is still the same. Use a slow hashing algorithm that's designed for passwords.
But, why is Argon2 superior to Bcrypt? On the surface, Argon2 has less foot-guns that Bcrypt. Bcrypt has a maximum password length of 72 bytes and that 72 bytes needs to be a null-terminated string. Argon2 on the other hand doesn't have a password length limit and works with any binary data. I think this alone is a good enough reason to use Argon2 over Bcrypt in newer systems. Still, newer algorithms aren't always more secure. Is it possible that Argon2 is weaker than Bcrypt?
Password hashing generally needs to complete < 1000 ms. In this scenario, bcrypt is stronger than pbkdf2, scrypt, and argon2.
Password cracking is mostly done with GPUs these days because they're much better at hashing than CPUs. Hashing is just a series of small computation and the GPU's many tiny compute units excel at these kind of work. The table below shows hashing speeds of SHA-256, a fast and simple hashing algorithm, between different hardware. The CPU benchmarks were ran using Go's crypto/sha2 standard library package and GPU benchmarks are from publicly available benchmarks. The device cost is either the rough second-hand market price or the retail price, whichever is cheaper.
| Device | Hashing speed (hashes per second) |
Cost (hashes per second per dollar) |
Power efficiency (hashes per second per watt) |
|---|---|---|---|
| Hetzner CCX23 (CPU) | 40,365,462 | ? | ? |
| MacBook Air M1 8 cores (CPU, 2020) | 90,800,749 | 302,669 | 3,026,691 |
| MacBook Pro M3 Pro 11 cores (CPU, 2023) | 171,638,888 | 156,177 | 5,721,296 |
| GTX 1080 (GPU, 2016) | 2,439,500,000 | 16,263,333 | 13,552,777 |
| RTX 5090 (GPU, 2025) | 28,353,300,000 | 14,183,741 | 48,885,000 |
The CPU numbers can be improved further but we see that GPUs are 50 to 100 times faster and 5 to 10 times more efficient.
The design of Argon2 attempts to close this gap by using a significant amount of memory. Argon2 can be configured to use anywhere from a few kilobytes to gigabytes of memory per hash. While GPUs are fast at pure computation, its memory (VRAM) has a limit on how fast it can transfer data (memory bandwidth) like regular RAM. Argon2's massive memory usage bottlenecks the GPU's memory bandwidth and puts a hard limit on how fast the GPU can calculate hashes. The faster L1 and L2 cache aren't useful either as these aren't large enough to handle multiple processes at once. Additionally, once you set the memory configuration high enough, the GPU quickly runs out of memory for more processes and has to put many of its compute units in idle.
Argon2 has 3 parameters: memory size, iteration count, and degree of parallelism. For a memory size m bytes and iteration count t with parallelism set to 1 (single thread), the total number of bytes read and written to memory is calculated by:
(3 × t - 1) × m
For example, Argon2 at 16 kibibytes and 3 iterations will read and write approximately 128 mebibytes. Argon2 also has 3 variations (Argon2i, Argon2d, Argon2id) to be exact but they have similar performance when using the same parameters so I'll be grouping them as the same algorithm.
The table below shows the estimated hashing speed based on the GPU's bandwidth and the actual hashing speeds at different memory configuration for the Nvidia GTX 1080 (2016) and Nvidia RTX 5090 (2025) GPU. The actual hashing speed are from benchmarks ran with John the Ripper, an open-source password cracking tool.
| Memory size | Bandwidth-estimated hashing speed (hashes per second) |
Actual hashing speed (hashes per second) |
|---|---|---|
| 16 MiB | 2,441 | 1,742 |
| 64 MiB | 610 | 387 |
| 256 MiB | 153 | 34 |
| Memory size | Bandwidth-estimated hashing speed (hashes per second) |
Actual hashing speed (hashes per second) |
|---|---|---|
| 16 MiB | 13,672 | 11,465 |
| 64 MiB | 3,418 | 2,400 |
| 256 MiB | 854 | 178 |
On both the GTX 1080 and RTX 5090, the benchmark numbers generally align with our estimate at 16 mebibytes and 64 mebibytes of memory. As expected, the GPUs also see a linear slowdown as it needs to read and write more bytes. However, the GPU experiences a quadratic slowdown when the memory parameter is increased to 256 mebibytes from 64 mebibytes. This indicates that both GPUs are limited by its memory size at around 64 mebibytes. After this point, the GPU has to do twice the work with half the compute units when the memory requirement doubles. I'm unsure why the GTX 1080 sees lower bandwidth utilization than the RTX 5090 (about 70% vs 80%) but the smaller compute power compared to the available memory bandwidth may be relevant.
On the other hand, Bcrypt closes the gap between CPUs and GPUs by making small and random reads from memory. Aside from bandwidth, VRAM also has a limit on transaction speed as each transactions are made at a set size (bus width). Individual Bcrypt transactions are smaller than the bus width of modern GPUs and a significant amount of the memory bandwidth is wasted. This theoretically makes Bcrypt around 4 to 8 times slower than Argon2 with similar memory transaction usage on a GPU.
However, Bcrypt only requires 4 kibibytes of memory per hash. This is small enough to run many processes using the shared memory of GPU's Streaming Multiprocessors (SM). While the L1 and L2 cache are read-only and managed by the hardware, the shared memory is a programmable memory pool that allows for direct read and write access. This moves the limiting factor from the memory bandwidth to the size of the shared memory.
Unlike Argon2, Bcrypt only has a single cost parameter. Increasing this value by 1 doubles the workload and the hashing time. With Bcrypt, the slowdown scales linearly with the cost factor.
An ideal hash function is one that's as fast as possible for the defender (CPU) but as slow as possible for the attacker (GPU). Below is a table listing the hashing speed between Argon2 and Bcrypt on a Hetzner CCX23 instance using Go's golang.org/x/crypto packages. I saw inconsistent performance between different Hetzner instances likely stemming from different server hardware so I've opted to use the benchmark result from the best performing instance.
| Hash function | Hashing time (milliseconds) |
|---|---|
| Argon2 (16 MiB, 3 iterations, 1 thread) | 23 |
| Argon2 (64 MiB, 3 iterations, 1 thread) | 107 |
| Argon2 (256 MiB, 3 iterations, 1 thread) | 460 |
| Bcrypt (cost 9) | 24 |
| Bcrypt (cost 11) | 100 |
| Bcrypt (cost 13) | 405 |
Next, the tables below shows the hashing speeds of Argon2 and Bcrypt for the GTX 1080 and RTX 5090. The attacker advantage quantifies how much faster the GPU is compared to a Hetzner CCX23 instance. It is calculated by dividing the GPU's hashing speed by the CPU's hashing speed. The GPU hashing speeds of Bcrypt was derived from Hashcat benchmarks and scaled according to the cost value. Hashcat is another popular password cracking tool and often used to benchmark GPUs. Note that Hashcat benchmarks can vary somewhat depending on the Hashcat version, server environment, and tuning parameters used. I've aimed to use the best available benchmarks for each GPU but the hashing speeds may still be optimized further.
| Hash function | Hashing speed (hashes per second) |
Attacker advantage |
|---|---|---|
| Argon2 (16 MiB, 3 iterations, 1 thread) | 1,742 | 40 |
| Argon2 (64 MiB, 3 iterations, 1 thread) | 387 | 41 |
| Argon2 (256 MiB, 3 iterations, 1 thread) | 34 | 15 |
| Bcrypt (cost 9) | 1,457 | 34 |
| Bcrypt (cost 11) | 364 | 36 |
| Bcrypt (cost 13) | 91 | 36 |
| Hash function | Hashing speed (hashes per second) |
Attacker advantage |
|---|---|---|
| Argon2 (16 MiB, 3 iterations, 1 thread) | 11,465 | 263 |
| Argon2 (64 MiB, 3 iterations, 1 thread) | 2,400 | 257 |
| Argon2 (256 MiB, 3 iterations, 1 thread) | 178 | 82 |
| Bcrypt (cost 9) | 19,050 | 469 |
| Bcrypt (cost 11) | 4,763 | 476 |
| Bcrypt (cost 13) | 1,190 | 481 |
Argon2 below 64 mebibytes and Bcrypt provide very similar effectiveness against the GTX 1080 but Argon2 is more effective against the RTX 5090. Above 64 mebibytes, both GPUs see a quadratic slowdown and Argon2 becomes the strongest option. Using a much more optimized CPU implementation from John the Ripper to benchmark the Hetzner server also yields similar trends.
While Argon2 at high memory configuration is our ideal solution, hashing user passwords shouldn't take more than 100 to 200 millisecond on a regular web server. Even at 100 milliseconds per hash, the server would only be able to handle 10 login requests per second per CPU thread. Using multiple threads per hash to cut down the hashing time isn't useful either in this context as it will reduce the number of concurrent processes the server can handle. Unfortunately, using more than 64 mebibytes of memory with Argon2 pushes the hashing time well above that line.
Focusing on Argon2 at lower memory configuration, the table below shows the estimated attacker advantage of Argon2 and Bcrypt on various NVidia GPUs, with both algorithms configured to run at about 20 milliseconds per hash on the Hetzner server. Argon2 hashing speed was derived from the RTX 5090 results and adjusted based on the memory bandwidth. The numbers for Bcrypt was derived from the GPU's Hashcat benchmarks. Note that the numbers of the GTX 1080 was also derived from the RTX 5090 for consistency. The actual benchmark showed a lower memory bandwidth utilization and it's likely that older GPUs like the GTX 1080 Ti and Titan V100 also shows similar performance characteristics on current implementations.
| GPU | Argon2 attacker advantage | Bcrypt attacker advantage |
|---|---|---|
| GTX 1080 (2016) | 47 | 34 |
| GTX 1080 Ti (2017) | 71 | 47 |
| RTX 2080 Ti (2018) | 90 | 77 |
| RTX 3090 (2020) | 137 | 164 |
| RTX 3090 Ti (2022) | 148 | 174 |
| RTX 4090 (2022) | 148 | 360 |
| RTX 5090 (2025) | 263 | 457 |
| Tesla V100 SXM2 16GB (2018) | 132 | 119 |
| A100 PCIe 40GB (2020) | 228 | 207 |
| A100 SXM4 40GB (2020) | 228 | 243 |
| H100 PCIe 80GB (2022) | 294 | 425 |
| H100 SXM5 80GB (2022) | 441 | 556 |
For both consumer and data center GPUs, older models are faster at Argon2 while newer models are faster at Bcrypt. At least for NVidia GPUs, the clock speed and shared memory has improved more than memory bandwidth and VRAM in general, though the gap is smaller for data center GPUs. Overall, the gap between Bcrypt and Argon2 at low memory configuration isn't very big, though I'd still give the edge to Argon2. It should be noted that GPU implementations of Bcrypt has been around for over a decade and has been heavily optimized. While current implementations of Argon2 are already close to the limit allowed by the memory bandwidth, it's possible that we might still see 10 to 20% performance improvements in the future.
Aside from GPUs, field programmable gate arrays (FPGA) and application specific integrated circuits (ASIC) are another hardware to consider. Both FPGAs and ASICs allow the program to be directly implemented onto the hardware. They offer vastly superior cost and energy efficiency but at a high initial cost.
Below is the same SHA-256 benchmark table with U3S23H, an ASIC-based Bitcoin miner. Bitcoins require 2 rounds of SHA-256 so it's not an exact comparison, but normalizing for it gives us a good estimate. Regardless, we can say that it's roughly 1,000 times faster and more efficient than GPUs.
| Device | Hashing speed (hashes per second) |
Cost (hashes per second per dollar) |
Power efficiency (hashes per second per watt) |
|---|---|---|---|
| MacBook Pro M3 Pro 11 cores (CPU, 2023) | 171,638,888 | 156,177 | 5,721,296 |
| RTX 5090 (GPU, 2025) | 28,353,300,000 | 14,183,741 | 48,885,000 |
| U3S23H (ASIC, 2025) | 2,320,000,000,000,000 | 66,666,666,666 | 210,526,315,789 |
Bcrypt can also run on these devices because of its low memory requirements. The table below shows hashing performance between GPUs and ZTEX 1.15y, an FPGA board.
| Device | Hashing speed (hashes per second) |
Cost (hashes per second per dollar) |
Power efficiency (hashes per second per watt) |
|---|---|---|---|
| GTX 1080 (GPU, 2016) | 23,316 | 155 | 129 |
| RTX 5090 (GPU, 2025) | 304,800 | 152 | 525 |
| ZTEX 1.15y (FPGA, 2011) | 116,666 | 1,666 | 3,888 |
The FPGA board is around 10 times faster and 1,000 times more efficient than the GPUs at running Bcrypt. On the other hand, the massive memory requirements of Argon2 relative to these devices makes it inefficient to run on them and sets Argon2 as the stronger hash function.
It's hard to predict future improvements to GPUs and hashing implementations, but based on current data, I think there's enough reason to always pick Argon2 over Bcrypt. Argon2 with a high memory configuration will always be best option. Even at lower memory configuration, Argon2 is as secure as Bcrypt against GPUs and significantly stronger against dedicated hardware. We also can't forget that Argon2 is more flexible and less error-prone than Bcrypt. At the same time, Bcrypt is still holding strong after over 20 years and I don't think you need to ditch it if you're already using it.
Special thanks to Alexander Peslyak (Solar Designer) for publishing benchmark results for Argon2 and answering some of my questions.