Back to Data Structures and Algorithms

Disjoint Set Union (Union-Find) Complete Guide

Data Structure • extra hard

Union-Find (Disjoint Set Union) tracks which items are in the same connected component.

It is extremely efficient for repeated connectivity checks with dynamic unions.

Path compression + union by rank/size makes operations nearly constant in practice.

Typical Complexity Baseline

MetricValue
Complexity Note 1Near O(1) amortized

See It

Concept Diagram

A small animated SVG that illustrates the core idea the moving parts you need to keep in your head when you read the rest of the guide.

Union-find merge two sets

find returns each set's root. union attaches the smaller-rank root under the larger one.

union by rank · find walks parent pointers to the root1234567
Step 1/157 elements, all in their own singleton set. parent[i] = i.

Watch First

Video Explainer

A quick visual walkthrough before the detailed sections pick the format that helps you learn fastest.

Mechanics

Core Concepts

The building blocks and terminology you need before everything else clicks. Skim or deep-read.

Parent array

What it is Points each node to representative parent.

Why it matters Parent array is a required building block for understanding how Disjoint Set Union (Union-Find) stays correct and performant on large inputs.

Find

What it is Resolve component representative.

Why it matters Find is a required building block for understanding how Disjoint Set Union (Union-Find) stays correct and performant on large inputs.

Union

What it is Merge two components.

Why it matters Union is a required building block for understanding how Disjoint Set Union (Union-Find) stays correct and performant on large inputs.

Rank/size

What it is Heuristic for shallow trees.

Why it matters Rank/size is a required building block for understanding how Disjoint Set Union (Union-Find) stays correct and performant on large inputs.

Path compression

What it is Flattens trees during find operations. Rewire node directly to root during find.

Why it matters Path compression is a required building block for understanding how Disjoint Set Union (Union-Find) stays correct and performant on large inputs. Rewire node directly to root during find. In Disjoint Set Union (Union-Find), this definition helps you reason about correctness and complexity when inputs scale.

Representative

What it is Canonical root id for component.

Why it matters Canonical root id for component. In Disjoint Set Union (Union-Find), this definition helps you reason about correctness and complexity when inputs scale.

Union by rank

What it is Attach smaller-depth tree under larger-depth root.

Why it matters Attach smaller-depth tree under larger-depth root. In Disjoint Set Union (Union-Find), this definition helps you reason about correctness and complexity when inputs scale.

Amortized complexity

What it is Average cost across operation sequence.

Why it matters Average cost across operation sequence. In Disjoint Set Union (Union-Find), this definition helps you reason about correctness and complexity when inputs scale.

Disjoint sets

What it is Non-overlapping groups of elements.

Why it matters Non-overlapping groups of elements. In Disjoint Set Union (Union-Find), this definition helps you reason about correctness and complexity when inputs scale.

Walkthrough

Putting It All Together

Every frame of the concept diagram, laid out top-to-bottom so you can scroll through the algorithm at your own pace.

Step 1/15
union by rank · find walks parent pointers to the root1234567

7 elements, all in their own singleton set. parent[i] = i.

Step 2/15
union by rank · find walks parent pointers to the root1234567

union(1, 2): roots are 1, 2. Different → attach 2 under 1.

Step 3/15
union by rank · find walks parent pointers to the root1234567

Set {1, 2} formed.

Step 4/15
union by rank · find walks parent pointers to the root1234567

union(3, 4): attach 4 under 3.

Step 5/15
union by rank · find walks parent pointers to the root1234567

Three groups now: {1,2}, {3,4}, plus singletons {5}, {6}, {7}.

Step 6/15
union by rank · find walks parent pointers to the root1234567

union(5, 6): attach 6 under 5.

Step 7/15
union by rank · find walks parent pointers to the root1234567

Groups: {1,2}, {3,4}, {5,6}, {7}.

Step 8/15
union by rank · find walks parent pointers to the root1234567

union(2, 4): first find(2). Walk to root → 1. (Then find(4) → 3.) Highlighting find(4) walk.

Step 9/15
union by rank · find walks parent pointers to the root1234567

Roots 1 ≠ 3 → merge. Attach 3 under 1 (so the {3,4} subtree dangles under 1).

Step 10/15
union by rank · find walks parent pointers to the root1234567

Set {1, 2, 3, 4} formed in one tree rooted at 1.

Step 11/15
union by rank · find walks parent pointers to the root1234567

union(1, 5): roots 1 and 5. Attach 5 under 1.

Step 12/15
union by rank · find walks parent pointers to the root1234567

Now {1, 2, 3, 4, 5, 6} all share root 1.

Step 13/15
union by rank · find walks parent pointers to the root1234567

union(7, 6): find(7) → 7 (still singleton). find(6) → 5 → 1.

Step 14/15
union by rank · find walks parent pointers to the root1234567

Attach 7 under 1.

Step 15/15
union by rank · find walks parent pointers to the root1234567

All 7 elements in one connected component. Each find/union is amortized O(α(n)) with path compression + union by rank.

Tradeoffs

How It Compares

When this technique wins, when it loses, and what to reach for instead.

vs. Graph traversal per query

Choose this when Choose Union-Find for many connectivity checks over many unions.

Tradeoff Traversal better when full path details are required.

vs. Adjacency matrix

Choose this when Choose Union-Find for sparse dynamic edge streams.

Tradeoff Matrix offers O(1) direct edge existence lookup but high memory.

vs. Topological sort

Choose this when Choose Union-Find for undirected connectivity grouping.

Tradeoff Topological sort is for directed acyclic dependency ordering.

In Production

Real-World Stories

Where this shows up at real companies and what the on-call engineer learned the hard way.

Networking platforms

Union-Find style component grouping helps reason about cluster and segment connectivity.

Takeaway Connectivity grouping is a common infrastructure primitive.

GIS tools

Spatial region merging and connectivity analyses use disjoint-set logic.

Takeaway Union-Find is practical for geometric grouping workflows.

Data quality pipelines

Entity resolution systems merge records into clusters of same real-world entity.

Takeaway Union operations model transitive equivalence elegantly.

Code

Implementation

A clean reference implementation in TypeScript, Python, and Go. Switch languages with the toggle.

Maintain parent + rank arrays and compress path on every find.

Complexity: Amortized almost O(1)

Union-Find with path compression

class UnionFind {
  private readonly parentByNode: number[]
  private readonly rankByNode: number[]

  constructor(nodeCount: number) {
    this.parentByNode = Array.from({ length: nodeCount }, (_, index) => index)
    this.rankByNode = new Array<number>(nodeCount).fill(0)
  }

  find(nodeId: number): number {
    if (this.parentByNode[nodeId] !== nodeId) {
      this.parentByNode[nodeId] = this.find(this.parentByNode[nodeId])
    }
    return this.parentByNode[nodeId]
  }
}

Watch Out

Common Problems and Failure Modes

The bugs and edge cases that bite engineers most often. Skim before you ship.

  • Skipping path compression and losing near-constant behavior.
  • Not handling out-of-range node ids.
  • Using recursion for find without stack consideration in some languages.
  • Confusing rank and size semantics during union.
  • Expecting Union-Find to return explicit path, not just connectivity.

Pro Tips

Tips and Tricks

Small habits that pay off every time you reach for this pattern.

  • Use union-find when repeated connectivity checks occur in dynamic grouping problems.
  • Always implement path compression and union by rank/size for near-constant performance.
  • Initialize parent pointers clearly for each node/index domain.
  • Great for cycle detection in undirected graph edge streams.

Pattern Recognition

When to Use

Concrete signals that tell you this is the right pattern both at work and on LeetCode.

Real-system usage signals

  • Fast connectivity queries in evolving graphs.
  • Simple implementation with strong practical performance.
  • Core building block for Kruskal MST and cycle detection.

LeetCode-specific tips (pattern-identification signals)

  • Identification signal: repeated connectivity queries interleaved with merge/union operations.
  • Identification signal: you need to know whether two nodes belong to same component quickly many times.
  • Dynamic graph connectivity with incremental edges is a classic Union-Find trigger.
  • For Disjoint Set Union (Union-Find) questions, start by naming the core invariant before writing code.
  • Use the constraint section to set time/space target first, then pick the data structure/algorithm.
  • Solve one tiny example by hand and map each transition to your variables before implementing.
  • Run adversarial cases: empty input, duplicates, max-size input, and sorted/reverse patterns when relevant.
  • During interviews, explain why your approach is the right pattern for this prompt, not just why the code works.

Practice

LeetCode Progression

A curated easy-to-hard problem ladder. Each one reinforces a specific aspect of the pattern.

#ProblemDifficultyTypical Complexity
1Number of ProvincesMediumO(n^2)
2Redundant ConnectionMediumO(n alpha(n))
3Accounts MergeMediumO(n alpha(n))
4Satisfiability of Equality EquationsMediumO(n alpha(n))
5Number of Connected Components in an Undirected GraphMediumO(V+E)
6Most Stones Removed with Same Row or ColumnMediumO(n alpha(n))
7Graph Valid TreeMediumO(V+E)
8Min Cost to Connect All PointsMediumO(n^2 log n)
9Number of Islands IIHardO(k alpha(n))
10Swim in Rising WaterHardO(n^2 log n)