Chapter 4: Hash Functions#

Imagine a magical paper shredder that always produces the same pattern of confetti for the same document, no matter how many times you shred it. That’s the essence of a hash function!

How Hash Functions Work#

  1. You feed any amount of data into the hash function.

  2. It produces a fixed-size output, called a hash or digest (e.g., 256 bits for SHA-256).

  3. The same input always produces the same output.

  4. It’s practically impossible to reverse the process or find two different inputs that produce the same output.

Hash functions are like digital fingerprints. They uniquely identify data without revealing the data itself.

A Simple Example: Custom Hash Function#

Let’s start with a simple (but insecure) hash function to illustrate the concept:

def simple_hash(message):
  hash_value = 0
  for char in message:
      hash_value = (hash_value * 31 + ord(char)) & 0xFFFFFFFF
  return hex(hash_value)[2:].zfill(8)

# Example usage
message1 = "Hello, World!"
message2 = "Hello, World"

print(f"Hash of '{message1}': {simple_hash(message1)}")
print(f"Hash of '{message2}': {simple_hash(message2)}")
Hash of 'Hello, World!': 5955b815
Hash of 'Hello, World': e1d9798c

This simple hash function demonstrates the basic concept, but it’s not cryptographically secure. In practice, we use well-tested hash functions like SHA-256.

SHA-256 in Action#

SHA-256 is widely used in cryptocurrencies like Bitcoin. Let’s see how it works:

import hashlib

def sha256_hash(message):
  return hashlib.sha256(message.encode()).hexdigest()

# Example usage
message1 = "Hello, World!"
message2 = "Hello, World"

print(f"SHA-256 of '{message1}': {sha256_hash(message1)}")
print(f"SHA-256 of '{message2}': {sha256_hash(message2)}")
SHA-256 of 'Hello, World!': dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
SHA-256 of 'Hello, World': 03675ac53ff9cd1535ccc7dfcdfa2c458c5218371f418dc136f2d19ac1fbe8a5

Notice how even a small change in the input produces a completely different hash output. This is known as the avalanche effect.

Pros and Cons of Hash Functions#

Pros:

  1. Fast to compute

  2. Deterministic (same input always produces the same output)

  3. Avalanche effect (small changes in input cause large changes in output)

  4. Useful for data integrity checks and password storage

Cons:

  1. Potential for collisions (though extremely rare for secure hash functions)

  2. Cannot be reversed to obtain the original input

  3. Not suitable for encrypting data (since they’re not reversible)

Real-world Applications#

  1. Password storage: Hash functions securely store passwords by hashing them before saving.

  2. Data integrity: Hashes verify that data hasn’t been altered during transmission.

  3. Blockchain: Hash functions link blocks securely in a blockchain.

What We Learned#

  • Hash functions create a unique, fixed-size output for any input.

  • They’re like digital fingerprints, ensuring data integrity.

  • SHA-256 is a widely used, secure hash function.

  • Hash functions are crucial for password security and blockchain technology.

Quick Check: Did You Get It?#

Let’s see if you caught the main ideas:

  1. What do we call the fixed-size output of a hash function? (Hint: It starts with ‘H’)

  2. Which hash function is commonly used in Bitcoin? (Hint: It’s three letters and a number)

  3. What effect describes how small input changes cause large output changes? (Hint: It’s a weather phenomenon)

Think about your answers, then check below!

Click to see the answers
  1. Hash

  2. SHA-256

  3. Avalanche effect

Great job if you got them all!