Union-Find (Disjoint Set Union)
Lesson, slides, and applied problem sets.
View SlidesLesson
Union-Find (Disjoint Set Union)
Why this module exists
Union-Find is a data structure for tracking connected components. It answers "are X and Y connected?" in nearly constant time and efficiently merges groups. It's essential for graph connectivity problems, Kruskal's MST algorithm, and many interview questions.
This module covers:
- Basic Union-Find structure
- Path compression optimization
- Union by rank
- Applications: connected components, redundant edges, account merging
1) What is Union-Find?
Union-Find (also called Disjoint Set Union or DSU) maintains a collection of disjoint sets and supports two operations:
- Find(x): Which set does x belong to? (returns the "representative")
- Union(x, y): Merge the sets containing x and y
Initial: {1} {2} {3} {4} {5}
Union(1,2): {1,2} {3} {4} {5}
Union(3,4): {1,2} {3,4} {5}
Union(1,3): {1,2,3,4} {5}
Find(2) == Find(4)? Yes (same component)
Find(2) == Find(5)? No (different components)
2) Tree Representation
Each set is represented as a tree. Each element points to its parent; the root is the representative.
1 3
/ /
2 4
Find(2) → follow parent → 1 (root)
Find(4) → follow parent → 3 (root)
Implementation:
type UnionFind struct {
parent []int
}
func NewUnionFind(n int) *UnionFind {
parent := make([]int, n)
for i := range parent {
parent[i] = i // Each element is its own parent initially
}
return &UnionFind{parent: parent}
}
class UnionFind:
def __init__(self, n):
self.parent = list(range(n)) # Each is its own parent
3) Basic Find
Follow parent pointers until reaching the root:
func (uf *UnionFind) Find(x int) int {
for uf.parent[x] != x {
x = uf.parent[x]
}
return x
}
def find(self, x):
while self.parent[x] != x:
x = self.parent[x]
return x
Problem: Trees can become tall (O(n) per find).
4) Path Compression
While finding the root, make every node point directly to the root:
func (uf *UnionFind) Find(x int) int {
if uf.parent[x] != x {
uf.parent[x] = uf.Find(uf.parent[x]) // Path compression
}
return uf.parent[x]
}
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Path compression
return self.parent[x]
This flattens the tree, making future finds O(1).
Before: After Find(4):
1 1
/ / | \
2 2 3 4
/
3
|
4
5) Basic Union
Connect two sets by making one root point to the other:
func (uf *UnionFind) Union(x, y int) {
rootX := uf.Find(x)
rootY := uf.Find(y)
if rootX != rootY {
uf.parent[rootX] = rootY
}
}
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x != root_y:
self.parent[root_x] = root_y
6) Union by Rank
To keep trees shallow, attach the smaller tree under the larger one:
type UnionFind struct {
parent []int
rank []int // Approximate tree height
}
func NewUnionFind(n int) *UnionFind {
parent := make([]int, n)
rank := make([]int, n)
for i := range parent {
parent[i] = i
rank[i] = 0
}
return &UnionFind{parent: parent, rank: rank}
}
func (uf *UnionFind) Union(x, y int) bool {
rootX := uf.Find(x)
rootY := uf.Find(y)
if rootX == rootY {
return false // Already connected
}
// Attach smaller rank tree under larger rank tree
if uf.rank[rootX] < uf.rank[rootY] {
uf.parent[rootX] = rootY
} else if uf.rank[rootX] > uf.rank[rootY] {
uf.parent[rootY] = rootX
} else {
uf.parent[rootY] = rootX
uf.rank[rootX]++
}
return true
}
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already connected
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
return True
7) Complexity with Optimizations
With both path compression and union by rank:
| Operation | Time |
|---|---|
| Find | O(α(n)) ≈ O(1) |
| Union | O(α(n)) ≈ O(1) |
Where α(n) is the inverse Ackermann function — grows so slowly it's effectively constant (< 5 for any practical n).
8) Application: Count Connected Components
Given n nodes and edges, count connected components:
func countComponents(n int, edges [][]int) int {
uf := NewUnionFind(n)
components := n
for _, edge := range edges {
if uf.Union(edge[0], edge[1]) {
components-- // Merged two components
}
}
return components
}
def count_components(n, edges):
uf = UnionFind(n)
components = n
for a, b in edges:
if uf.union(a, b):
components -= 1
return components
9) Application: Redundant Connection
Given n nodes and n edges forming a tree plus one extra edge, find the edge that creates a cycle:
func findRedundantConnection(edges [][]int) []int {
n := len(edges)
uf := NewUnionFind(n + 1) // 1-indexed nodes
for _, edge := range edges {
if !uf.Union(edge[0], edge[1]) {
return edge // Already connected = this edge creates cycle
}
}
return nil
}
def find_redundant_connection(edges):
n = len(edges)
uf = UnionFind(n + 1) # 1-indexed
for a, b in edges:
if not uf.union(a, b):
return [a, b] # Creates cycle
return None
Key insight: If Union returns false (already connected), adding this edge creates a cycle.
10) Application: Accounts Merge
Given accounts where each account has a name and emails, merge accounts that share any email:
func accountsMerge(accounts [][]string) [][]string {
emailToID := make(map[string]int) // email -> first account index
uf := NewUnionFind(len(accounts))
// Build connections
for i, account := range accounts {
for j := 1; j < len(account); j++ {
email := account[j]
if firstID, exists := emailToID[email]; exists {
uf.Union(i, firstID) // Same email = same person
} else {
emailToID[email] = i
}
}
}
// Group emails by root account
rootToEmails := make(map[int][]string)
for email, id := range emailToID {
root := uf.Find(id)
rootToEmails[root] = append(rootToEmails[root], email)
}
// Build result
result := [][]string{}
for root, emails := range rootToEmails {
sort.Strings(emails)
name := accounts[root][0]
merged := append([]string{name}, emails...)
result = append(result, merged)
}
return result
}
def accounts_merge(accounts):
email_to_id = {} # email -> account index
uf = UnionFind(len(accounts))
# Build connections
for i, account in enumerate(accounts):
for email in account[1:]:
if email in email_to_id:
uf.union(i, email_to_id[email])
else:
email_to_id[email] = i
# Group emails by root
from collections import defaultdict
root_to_emails = defaultdict(set)
for email, id in email_to_id.items():
root = uf.find(id)
root_to_emails[root].add(email)
# Build result
result = []
for root, emails in root_to_emails.items():
name = accounts[root][0]
result.append([name] + sorted(emails))
return result
11) Union-Find vs DFS/BFS
Both can solve connectivity problems. When to use which?
Union-Find is better when:
- Processing edges one at a time (online/streaming)
- Need to check connectivity frequently
- Building incrementally
DFS/BFS is better when:
- Need to traverse paths
- Want to find shortest path
- Processing graph once
12) Common Variations
Union-Find with Size: Track component size instead of rank; useful for weighted quick-union.
type UnionFind struct {
parent []int
size []int
}
func (uf *UnionFind) Union(x, y int) bool {
rootX, rootY := uf.Find(x), uf.Find(y)
if rootX == rootY {
return false
}
// Attach smaller to larger
if uf.size[rootX] < uf.size[rootY] {
rootX, rootY = rootY, rootX
}
uf.parent[rootY] = rootX
uf.size[rootX] += uf.size[rootY]
return true
}
13) Common Mistakes
- Forgetting path compression: Without it, trees can become O(n) tall
- Not returning union status: Need to know if components were merged
- 1-indexed vs 0-indexed: Check problem constraints
- Mapping strings to integers: For string-based problems, use a map
14) Complexity Summary
| Operation | Naive | With Optimizations |
|---|---|---|
| Find | O(n) | O(α(n)) ≈ O(1) |
| Union | O(n) | O(α(n)) ≈ O(1) |
| n operations | O(n²) | O(n α(n)) ≈ O(n) |
Outcomes
After this module, you should be able to:
- Implement Union-Find with path compression and union by rank
- Apply Union-Find to count connected components
- Detect cycles in graphs using Union-Find
- Merge groups based on shared elements (accounts merge pattern)
Practice Set
- Number of Connected Components (Medium) — Count components using Union-Find
- Redundant Connection (Medium) — Find cycle-creating edge
- Accounts Merge (Medium) — Group by shared elements
Start with Connected Components to understand the basic structure. Redundant Connection shows how Union returning false indicates a cycle. Accounts Merge combines Union-Find with string mapping.