Back to Data Structures and Algorithms

Hash Map / Dictionary Complete Guide

Data Structure • easy

Hash maps store key-value pairs and are one of the most important tools for reducing nested loops to single-pass solutions.

They are powered by hash functions that map keys to buckets. Collisions happen and are resolved internally with chaining or probing.

In practice, hash maps are everywhere: caching, joins, deduplication, indexing, rate limits, and identity maps.

Typical Complexity Baseline

MetricValue
Insert/find averageO(1)

See It

Concept Diagram

A small animated SVG that illustrates the core idea the moving parts you need to keep in your head when you read the rest of the guide.

Hash map keys land in buckets

The key is hashed and reduced modulo the bucket count. Collisions chain inside one bucket.

key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]·[4]·[5]·[6]·[7]·
Step 1/15Empty map with 8 buckets. We'll insert several keys, force a collision, then look one up.

Watch First

Video Explainer

A quick visual walkthrough before the detailed sections pick the format that helps you learn fastest.

Mechanics

Core Concepts

The building blocks and terminology you need before everything else clicks. Skim or deep-read.

Hash function

What it is A hash function turns a key (like "user:42") into a deterministic integer. The table then maps that integer to a bucket index using modulo by bucket length. Function that maps a key to a deterministic integer value.

Why it matters Distribution quality controls performance. If hashes spread keys evenly, bucket scans stay short and lookup remains close to O(1) average. Good hash functions are fast and distribute keys uniformly. They do not need to be cryptographic for hash map usage, but they must reduce clustering.

Buckets

What it is Buckets are storage slots in the table. Each bucket holds zero or more entries, usually as a linked list or dynamic array.

Why it matters Bucket count determines collision pressure. Too few buckets means long chains and slower operations. More buckets means better speed but higher memory use.

Collision handling

What it is A collision happens when two different keys land in the same bucket index. Common strategies are separate chaining and open addressing.

Why it matters Correct collision handling is what makes hash maps trustworthy. Without it, inserts overwrite each other and lookups become incorrect.

Load factor

What it is Load factor is entryCount / bucketCount. It measures how full the table is. Ratio between entries and buckets (n / m).

Why it matters As load factor rises, collisions increase and average lookup cost rises. Most implementations resize before load factor gets too high. Load factor is a control metric for balancing speed and memory. High load means fewer buckets per key and higher collision probability.

Rehashing

What it is Rehashing allocates a larger bucket table and reinserts existing entries so each key lands in its new bucket index. Rebuild table with new capacity and remap all keys.

Why it matters Rehashing keeps long-term performance stable. It is expensive occasionally, but it prevents every operation from getting slower as data grows. Rehashing is usually O(n) for that one resize event, but amortized over many inserts it preserves near O(1) average insertion and lookup.

Collision

What it is Two different keys map to the same bucket index.

Why it matters Collisions are normal. Correctness comes from storing key-value pairs and checking full key equality during lookup, not from unique hashes.

Bucket

What it is One slot in the table that stores one or more entries.

Why it matters Buckets are the first partition step. Average performance depends on keeping per-bucket entry counts small via good hashing and resizing.

Walkthrough

Putting It All Together

Every frame of the concept diagram, laid out top-to-bottom so you can scroll through the algorithm at your own pace.

Step 1/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]·[4]·[5]·[6]·[7]·

Empty map with 8 buckets. We'll insert several keys, force a collision, then look one up.

Step 2/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]·[4]·[5]·[6]·[7]·"alice"hash()% 8 = 3

Insert 'alice'. hash('alice') % 8 = 3.

Step 3/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice[4]·[5]·[6]·[7]·

Place 'alice' in bucket 3.

Step 4/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice[4]·[5]·[6]·[7]·"bob"hash()% 8 = 6

Insert 'bob'. hash('bob') % 8 = 6.

Step 5/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice[4]·[5]·[6]bob[7]·

Place 'bob' in bucket 6.

Step 6/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice[4]·[5]·[6]bob[7]·"carol"hash()% 8 = 3

Insert 'carol'. hash % 8 = 3 collision with 'alice'.

Step 7/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice → carol[4]·[5]·[6]bob[7]·

Chain 'carol' onto bucket 3 (separate chaining).

Step 8/15
key → hash(key) % 8 → bucket[0]·[1]·[2]·[3]alice → carol[4]·[5]·[6]bob[7]·"dave"hash()% 8 = 0

Insert 'dave'. hash % 8 = 0.

Step 9/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol[4]·[5]·[6]bob[7]·

Place 'dave' in bucket 0.

Step 10/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol[4]·[5]·[6]bob[7]·"eve"hash()% 8 = 5

Insert 'eve'. hash % 8 = 5.

Step 11/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol[4]·[5]eve[6]bob[7]·

Place 'eve' in bucket 5.

Step 12/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol[4]·[5]eve[6]bob[7]·"frank"hash()% 8 = 3

Insert 'frank'. hash % 8 = 3 another collision in bucket 3.

Step 13/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol → frank[4]·[5]eve[6]bob[7]·

Bucket 3 chain is now 3 long. As load factor climbs the map will resize.

Step 14/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol → frank[4]·[5]eve[6]bob[7]·"carol"hash()% 8 = 3

Lookup 'carol'. hash % 8 = 3.

Step 15/15
key → hash(key) % 8 → bucket[0]dave[1]·[2]·[3]alice → carol → frank[4]·[5]eve[6]bob[7]·"carol"hash()% 8 = 3

Walk the chain in bucket 3: alice ≠ carol → carol = carol → found. O(1) average; O(chain length) worst case.

Tradeoffs

How It Compares

When this technique wins, when it loses, and what to reach for instead.

vs. Array

Choose this when Choose hash maps for key-based lookup not tied to numeric index.

Tradeoff Memory overhead is higher than compact arrays.

vs. Trie

Choose this when Choose hash maps for exact-key retrieval.

Tradeoff Trie performs better for prefix-heavy workloads.

vs. Balanced Tree Map

Choose this when Choose hash maps for fastest average lookup.

Tradeoff Tree maps keep keys sorted and support range queries.

In Production

Real-World Stories

Where this shows up at real companies and what the on-call engineer learned the hard way.

Redis

Redis dictionaries are hash-table based and power core key lookup behavior.

Takeaway Fast key indexing is foundational for low-latency caching.

Shopify

Order and session services use hash-style key lookup for idempotency and dedup checks.

Takeaway Hash maps help enforce correctness in high-throughput commerce workflows.

Cloudflare

Rate-limiting and bot protection paths rely on quick key counters by IP/token signatures.

Takeaway Hash maps turn high-cardinality counting into practical real-time control.

Code

Implementation

A clean reference implementation in TypeScript, Python, and Go. Switch languages with the toggle.

Build your own bucket table, hash function, collision chain, and resize flow so the structure is explicit rather than hidden behind built-in map classes.

Complexity: Average insert/get O(1), resize O(n), space O(n)

Hash map implementation from scratch (chaining)

type HashNode = { key: string; value: number; next: HashNode | null }

class SimpleHashMap {
  private buckets: Array<HashNode | null>
  private entryCount = 0

  constructor(private capacity = 16) {
    this.buckets = new Array<HashNode | null>(capacity).fill(null)
  }

  private hashKey(key: string): number {
    let hashValue = 2166136261
    for (let index = 0; index < key.length; index += 1) {
      hashValue ^= key.charCodeAt(index)
      hashValue = Math.imul(hashValue, 16777619)
    }
    return (hashValue >>> 0) % this.capacity
  }

  private resizeIfNeeded(): void {
    const loadFactor = this.entryCount / this.capacity
    if (loadFactor < 0.75) {
      return
    }

    const oldBuckets = this.buckets
    this.capacity *= 2
    this.buckets = new Array<HashNode | null>(this.capacity).fill(null)
    this.entryCount = 0

    for (const bucketHead of oldBuckets) {
      let currentNode = bucketHead
      while (currentNode) {
        this.set(currentNode.key, currentNode.value)
        currentNode = currentNode.next
      }
    }
  }

  set(key: string, value: number): void {
    this.resizeIfNeeded()
    const bucketIndex = this.hashKey(key)
    let currentNode = this.buckets[bucketIndex]

    while (currentNode) {
      if (currentNode.key === key) {
        currentNode.value = value
        return
      }
      currentNode = currentNode.next
    }

    this.buckets[bucketIndex] = { key, value, next: this.buckets[bucketIndex] }
    this.entryCount += 1
  }

  get(key: string): number | null {
    const bucketIndex = this.hashKey(key)
    let currentNode = this.buckets[bucketIndex]

    while (currentNode) {
      if (currentNode.key === key) {
        return currentNode.value
      }
      currentNode = currentNode.next
    }

    return null
  }
}

function firstDuplicateUsingCustomHashMap(values: number[]): number | null {
  const countsByValue = new SimpleHashMap()

  for (const value of values) {
    const key = String(value)
    const previousCount = countsByValue.get(key) ?? 0
    const nextCount = previousCount + 1
    countsByValue.set(key, nextCount)

    if (nextCount > 1) {
      return value
    }
  }

  return null
}

Watch Out

Common Problems and Failure Modes

The bugs and edge cases that bite engineers most often. Skim before you ship.

  • Assuming worst-case behavior is always O(1); adversarial keys can degrade performance.
  • Using mutable objects as keys without stable hash/equality semantics.
  • Ignoring memory cost when key cardinality grows rapidly.
  • Reimplementing map logic manually instead of using battle-tested library primitives.
  • Forgetting that hash maps do not preserve sorted order by default.

Pro Tips

Tips and Tricks

Small habits that pay off every time you reach for this pattern.

  • Hash map is usually the first check when you need fast existence or frequency lookup.
  • Store exactly what you need as value (count, index, metadata), not entire objects by default.
  • When duplicates matter, count frequencies instead of boolean presence.
  • Think in two passes if one pass gets branch-heavy and error-prone.

Pattern Recognition

When to Use

Concrete signals that tell you this is the right pattern both at work and on LeetCode.

Real-system usage signals

  • Average O(1) lookup/insert/delete for key-based access patterns.
  • Transforms expensive membership checks into fast key probes.
  • Helps express intent clearly in business logic and data pipelines.

LeetCode-specific tips (pattern-identification signals)

  • Identification signal: you need fast existence checks, frequency counts, or key-to-value lookup during one pass.
  • Identification signal: brute force compares every pair, and the optimized version hints at remembering what was seen earlier.
  • When wording includes "duplicate", "count occurrences", or "find complement", hash-map is usually the first candidate.
  • For Hash Map / Dictionary questions, start by naming the core invariant before writing code.
  • Use the constraint section to set time/space target first, then pick the data structure/algorithm.
  • Solve one tiny example by hand and map each transition to your variables before implementing.
  • Run adversarial cases: empty input, duplicates, max-size input, and sorted/reverse patterns when relevant.
  • During interviews, explain why your approach is the right pattern for this prompt, not just why the code works.

Practice

LeetCode Progression

A curated easy-to-hard problem ladder. Each one reinforces a specific aspect of the pattern.

#ProblemDifficultyTypical Complexity
1Contains DuplicateEasyO(n)
2Valid AnagramEasyO(n)
3Two SumEasyO(n)
4Group AnagramsMediumO(n*k log k)
5Top K Frequent ElementsMediumO(n log k)
6Subarray Sum Equals KMediumO(n)
7Longest Consecutive SequenceMediumO(n)
84Sum IIMediumO(n^2)
9Minimum Window SubstringHardO(n)
10LFU CacheHardO(1) amortized