This tutorial will teach you how to create secure hashes using built-in functionality from Python’s hashlib module.

Understanding the significance of hashing and how to programmatically compute secure hashes can be helpful—even if you do not work in application security. But why?

Well, when working on Python projects, you’ll likely come across instances where you are concerned about storing passwords and other sensitive info in databases or source code files. In such cases, it’s safer to run the hashing algorithm on sensitive info and store the hash instead of the information.

In this guide, we’ll cover what hashing is and how it is different from encryption. We’ll also go over the properties of secure hash functions. Then, we’ll use common hashing algorithms to compute the hash of plaintext in Python. To do this, we’ll use the built-in hashlib module.

For all of this and more, let’s get started!

What Is Hashing?

The process of hashing takes in a message string and gives a fixed-length output called the hash. Meaning the length of the output hash for a given hashing algorithm is fixed – regardless of the length of the input. But how is it different from encryption?

In encryption, the message or plain text is encrypted using an encryption algorithm that gives an encrypted output. We can then run the decryption algorithm on the encrypted output to get back the message string.

<img alt="What-Is-Hashing" data- data-src="https://kirelos.com/wp-content/uploads/2023/03/echo/What-Is-Hashing.png" data- decoding="async" height="445" src="data:image/svg xml,” width=”1387″>

However, hashing works differently. We just learned that the process of encryption is invertible in that you can go from the encrypted message to the unencrypted message and vice versa.

Unlike encryption, hashing is not an invertible process, meaning we cannot go from the hash to the input message.

<img alt="Properties-of-Hash-Functions" data- data-src="https://kirelos.com/wp-content/uploads/2023/03/echo/Properties-of-Hash-Functions.png" data- decoding="async" height="454" src="data:image/svg xml,” width=”911″>

Properties of Hash Functions

Let’s quickly go over some properties that hash functions should satisfy:

  • Deterministic: Hash functions are deterministic. Given a message m, the hash of m is always the same.
  • Preimage Resistant: We’ve already covered this when we said hashing is not an invertible operation. The preimage resistance property states that it’s infeasible to find the message m from the output hash.
  • Collision Resistant: It should be difficult (or computationally infeasible) to find two different message strings m1 and m2 such that the hash of m1 is equal to the hash of m2. This property is called collision resistance.
  • Second Preimage Resistant: This means given a message m1 and the corresponding hash m2, it’s infeasible to find another message m2 such that hash(m1) = hash(m2).

Python’s hashlib Module

Python’s built in hashlib module provides implementations of several hashing and message digest algorithms including the SHA and MD5 algorithms.

To use the constructors and built-in functions from the Python hashlib module, you can import it into your working environment like so:

import hashlib

The hashlib module provides the algorithms_available and algorithms_guaranteed constants, which denote the set of algorithms whose implementations are available and are guaranteed on a platform, respectively.

Therefore, algorithms_guaranteed is a subset of algorithms_available.

<img alt="Pythons-hashlib-Module" data- data-src="https://kirelos.com/wp-content/uploads/2023/03/echo/Pythons-hashlib-Module.png" data- decoding="async" height="415" src="data:image/svg xml,” width=”555″>

Start a Python REPL, import hashlib and access the algorithms_available and algorithms_guaranteed constants:

>>> hashlib.algorithms_available
# Output
{'md5', 'md5-sha1', 'sha3_256', 'shake_128', 'sha384', 'sha512_256', 'sha512', 'md4', 
'shake_256', 'whirlpool', 'sha1', 'sha3_512', 'sha3_384', 'sha256', 'ripemd160', 'mdc2', 
'sha512_224', 'blake2s', 'blake2b', 'sha3_224', 'sm3', 'sha224'}
>>> hashlib.algorithms_guaranteed
# Output
{'md5', 'shake_256', 'sha3_256', 'shake_128', 'blake2b', 'sha3_224', 'sha3_384', 
'sha384', 'sha256', 'sha1', 'sha3_512', 'sha512', 'blake2s', 'sha224'}

We see that algorithms_guaranteed is indeed a subset of algorithms_available

How to Create Hash Objects in Python

<img alt="python-hashlib-1" data- data-src="https://kirelos.com/wp-content/uploads/2023/03/echo/1-4-1500×844.png" data- decoding="async" height="422" src="data:image/svg xml,” width=”750″>

Next let’s learn how to create hash objects in Python. We’ll compute the SHA256 hash of a message string using the following methods:

  • The generic new() constructor 
  • Algorithm-Specific Constructors

Using the new() Constructor

Let’s initialize the message string:

>>> message = "Geekflare is awesome!"

To instantiate the hash object, we can use the new() constructor and pass in the name of the algorithm as shown:

>>> sha256_hash = hashlib.new("SHA256")

We can now call the update() method on the hash object with the message string as the argument:

>>> sha256_hash.update(message)

If you do so, you’ll run into an error as hashing algorithms can only work with byte strings.

Traceback (most recent call last):
  File "", line 1, in 
TypeError: Unicode-objects must be encoded before hashing

To get the encoded string, you can call the encode() method on the method string, and then use it in the update() method call. After doing so, you can call the hexdigest() method to get the sha256 hash corresponding to the message string.

sha256_hash.update(message.encode())
sha256_hash.hexdigest()
# Output:'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

Instead of encoding the message string using the encode() method, you can also define it as a string of bytes by prefixing the string with b like so:

message = b"Geekflare is awesome!"
sha256_hash.update(message)
sha256_hash.hexdigest()
# Output: 'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

The obtained hash is the same as previous hash, which confirms the deterministic nature of hash functions.

In addition, a small change in the message string should cause the hash to change drastically (also known as “avalanche effect”).

To verify this, let’s change the ‘a’ in ‘awesome’ to ‘A’, and compute the hash:

message = "Geekflare is Awesome!"
h1 = hashlib.new("SHA256")
h1.update(message.encode())
h1.hexdigest()
# Output: '3c67f334cc598912dc66464f77acb71d88cfd6c8cba8e64a7b749d093c1a53ab'

We see that the hash changes completely.

Using the Algorithm-Specific Constructor

In the previous example, we used the generic new() constructor and passed in “SHA256” as the name of the algorithm to create the hash object.

Instead of doing so, we can also use the sha256() constructor as shown:

sha256_hash = hashlib.sha256()
message= "Geekflare is awesome!"
sha256_hash.update(message.encode())
sha256_hash.hexdigest()
# Output: 'b360c77de704ad8f02af963d7da9b3bb4e0da6b81fceb4c1b36723e9d6d9de3d'

The output hash is identical to the hash we obtained earlier for the message string “Geekflare is awesome!”.

Exploring the Attributes of Hash Objects

The hash objects have a few useful attributes:

  • The digest_size attribute denotes the size of the digest in bytes. For example, the SHA256 algorithm returns a 256-bit hash, which is equivalent to 32 bytes 
  • The block_size attribute refers to the block size used in the hashing algorithm.
  • The name attribute is the name of the algorithm that we can use in the new() constructor. Looking up the value of this attribute can be helpful when the hash objects don’t have descriptive names.

We can check these attributes for the sha256_hash object we created earlier:

>>> sha256_hash.digest_size
32
>>> sha256_hash.block_size
64
>>> sha256_hash.name
'sha256'

Next, let’s look at some interesting applications of hashing using Python’s hashlib module.

Practical Examples of Hashing

<img alt="Practical-Examples-of-Hashing" data- data-src="https://kirelos.com/wp-content/uploads/2023/03/echo/Practical-Examples-of-Hashing.png" data- decoding="async" height="321" src="data:image/svg xml,” width=”404″>

Verifying Integrity of Software and Files

As developers, we download and install software packages all the time. This is true regardless of whether you’re working on the Linux distro or on a Windows or a Mac.

However, some mirrors for software packages may not be trustworthy. You can find the hash (or checksum) beside the download link. And you can verify the integrity of the downloaded software by computing the hash and comparing it with the official hash.

This can be applied to files on your machine as well. Even the smallest change in file contents will change the hash drastically, you can check if a file has been modified by verifying the hash.

Here’s a simple example. Create a text file ‘my_file.txt’ in the working directory, and add some content to it.

$ cat my_file.txt
This is a sample text file.
We are  going to compute the SHA256 hash of this text file and also
check if the file has been modified by
recomputing the hash.

You can then open the file in read binary mode ('rb'), read in the contents of the file and compute the SHA256 hash as shown:

>>> import hashlib
>>> with open("my_file.txt","rb") as file:
...     file_contents = file.read()
...     sha256_hash = hashlib.sha256()
...     sha256_hash.update(file_contents)
...     original_hash = sha256_hash.hexdigest()

Here, the variable original_hash is the hash of ‘my_file.txt’ in its current state.

>>> original_hash
# Output: '53bfd0551dc06c4515069d1f0dc715d002d451c8799add29f3e5b7328fda9f8f'

Now modify the file ‘my_file.txt’. You can remove the extra leading whitespace before the word ‘going’. 🙂

Compute the hash yet again and store it in the computed_hash variable.

>>> import hashlib
>>> with open("my_file.txt","rb") as file:
...     file_contents = file.read()
...     sha256_hash = hashlib.sha256()
...     sha256_hash.update(file_contents)
...     computed_hash = sha256_hash.hexdigest()

You can then add a simple assert statement that asserts if the computed_hash is equal to the original_hash.

>>> assert computed_hash == original_hash

If the file is modified (which is true in this case), you should get an AssertionError:

Traceback (most recent call last):
  File "", line 1, in 
AssertionError

You can use hashing when storing sensitive info, such as passwords in databases. You can also use hashing in password authentication when connecting to databases. Validate the hash of the inputted password against the hash of the correct password.

Conclusion

I hope this tutorial helped you learn about generating secure hashes with Python. Here are the key takeaways:

  • Python’s hashlib module provides ready-to-use implementations of several hashing algorithms. You can get the list of algorithms guaranteed on your platform using hashlib.algorithms_guaranteed.
  • To create a hash object, you can use the generic new() constructor with the syntax: hashlib.new("algo-name"). Alternatively, you can use the constructors corresponding to the specific hashing algorithms, like so: hashlib.sha256() for the SHA 256 hash.
  • After initializing the message string to be hashed and the hash object, you can call the update() method on the hash object, followed by the hexdigest() method to get the hash.
  • Hashing can come in handy when checking the integrity of software artifacts and files, storing sensitive info in databases, and more.

Next, learn how to code a random password generator in Python.