Union-Find (Disjoint Set Union)

Lesson, slides, and applied problem sets.

Lesson

Union-Find (Disjoint Set Union)

Why this module exists

Union-Find is a data structure for tracking connected components. It answers "are X and Y connected?" in nearly constant time and efficiently merges groups. It's essential for graph connectivity problems, Kruskal's MST algorithm, and many interview questions.

This module covers:

Basic Union-Find structure
Path compression optimization
Union by rank
Applications: connected components, redundant edges, account merging

1) What is Union-Find?

Union-Find (also called Disjoint Set Union or DSU) maintains a collection of disjoint sets and supports two operations:

Find(x): Which set does x belong to? (returns the "representative")
Union(x, y): Merge the sets containing x and y

Initial:    {1} {2} {3} {4} {5}

Union(1,2): {1,2} {3} {4} {5}
Union(3,4): {1,2} {3,4} {5}
Union(1,3): {1,2,3,4} {5}

Find(2) == Find(4)?  Yes (same component)
Find(2) == Find(5)?  No  (different components)

2) Tree Representation

Each set is represented as a tree. Each element points to its parent; the root is the representative.

    1          3
   /          /
  2          4

Find(2) → follow parent → 1 (root)
Find(4) → follow parent → 3 (root)

Implementation:

type UnionFind struct {
    parent []int
}

func NewUnionFind(n int) *UnionFind {
    parent := make([]int, n)
    for i := range parent {
        parent[i] = i  // Each element is its own parent initially
    }
    return &UnionFind{parent: parent}
}

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))  # Each is its own parent

3) Basic Find

Follow parent pointers until reaching the root:

func (uf *UnionFind) Find(x int) int {
    for uf.parent[x] != x {
        x = uf.parent[x]
    }
    return x
}

def find(self, x):
    while self.parent[x] != x:
        x = self.parent[x]
    return x

Problem: Trees can become tall (O(n) per find).

4) Path Compression

While finding the root, make every node point directly to the root:

func (uf *UnionFind) Find(x int) int {
    if uf.parent[x] != x {
        uf.parent[x] = uf.Find(uf.parent[x])  // Path compression
    }
    return uf.parent[x]
}

def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # Path compression
    return self.parent[x]

This flattens the tree, making future finds O(1).

Before:     After Find(4):
    1           1
   /          / | \
  2          2  3  4
 /
3
|
4

5) Basic Union

Connect two sets by making one root point to the other:

func (uf *UnionFind) Union(x, y int) {
    rootX := uf.Find(x)
    rootY := uf.Find(y)
    if rootX != rootY {
        uf.parent[rootX] = rootY
    }
}

def union(self, x, y):
    root_x = self.find(x)
    root_y = self.find(y)
    if root_x != root_y:
        self.parent[root_x] = root_y

6) Union by Rank

To keep trees shallow, attach the smaller tree under the larger one:

type UnionFind struct {
    parent []int
    rank   []int  // Approximate tree height
}

func NewUnionFind(n int) *UnionFind {
    parent := make([]int, n)
    rank := make([]int, n)
    for i := range parent {
        parent[i] = i
        rank[i] = 0
    }
    return &UnionFind{parent: parent, rank: rank}
}

func (uf *UnionFind) Union(x, y int) bool {
    rootX := uf.Find(x)
    rootY := uf.Find(y)

    if rootX == rootY {
        return false  // Already connected
    }

    // Attach smaller rank tree under larger rank tree
    if uf.rank[rootX] < uf.rank[rootY] {
        uf.parent[rootX] = rootY
    } else if uf.rank[rootX] > uf.rank[rootY] {
        uf.parent[rootY] = rootX
    } else {
        uf.parent[rootY] = rootX
        uf.rank[rootX]++
    }

    return true
}

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        root_x, root_y = self.find(x), self.find(y)

        if root_x == root_y:
            return False  # Already connected

        if self.rank[root_x] < self.rank[root_y]:
            self.parent[root_x] = root_y
        elif self.rank[root_x] > self.rank[root_y]:
            self.parent[root_y] = root_x
        else:
            self.parent[root_y] = root_x
            self.rank[root_x] += 1

        return True

7) Complexity with Optimizations

With both path compression and union by rank:

Operation	Time
Find	O(α(n)) ≈ O(1)
Union	O(α(n)) ≈ O(1)

Where α(n) is the inverse Ackermann function — grows so slowly it's effectively constant (< 5 for any practical n).

8) Application: Count Connected Components

Given n nodes and edges, count connected components:

func countComponents(n int, edges [][]int) int {
    uf := NewUnionFind(n)
    components := n

    for _, edge := range edges {
        if uf.Union(edge[0], edge[1]) {
            components--  // Merged two components
        }
    }

    return components
}

def count_components(n, edges):
    uf = UnionFind(n)
    components = n

    for a, b in edges:
        if uf.union(a, b):
            components -= 1

    return components

9) Application: Redundant Connection

Given n nodes and n edges forming a tree plus one extra edge, find the edge that creates a cycle:

func findRedundantConnection(edges [][]int) []int {
    n := len(edges)
    uf := NewUnionFind(n + 1)  // 1-indexed nodes

    for _, edge := range edges {
        if !uf.Union(edge[0], edge[1]) {
            return edge  // Already connected = this edge creates cycle
        }
    }

    return nil
}

def find_redundant_connection(edges):
    n = len(edges)
    uf = UnionFind(n + 1)  # 1-indexed

    for a, b in edges:
        if not uf.union(a, b):
            return [a, b]  # Creates cycle

    return None

Key insight: If Union returns false (already connected), adding this edge creates a cycle.

10) Application: Accounts Merge

Given accounts where each account has a name and emails, merge accounts that share any email:

func accountsMerge(accounts [][]string) [][]string {
    emailToID := make(map[string]int)  // email -> first account index
    uf := NewUnionFind(len(accounts))

    // Build connections
    for i, account := range accounts {
        for j := 1; j < len(account); j++ {
            email := account[j]
            if firstID, exists := emailToID[email]; exists {
                uf.Union(i, firstID)  // Same email = same person
            } else {
                emailToID[email] = i
            }
        }
    }

    // Group emails by root account
    rootToEmails := make(map[int][]string)
    for email, id := range emailToID {
        root := uf.Find(id)
        rootToEmails[root] = append(rootToEmails[root], email)
    }

    // Build result
    result := [][]string{}
    for root, emails := range rootToEmails {
        sort.Strings(emails)
        name := accounts[root][0]
        merged := append([]string{name}, emails...)
        result = append(result, merged)
    }

    return result
}

def accounts_merge(accounts):
    email_to_id = {}  # email -> account index
    uf = UnionFind(len(accounts))

    # Build connections
    for i, account in enumerate(accounts):
        for email in account[1:]:
            if email in email_to_id:
                uf.union(i, email_to_id[email])
            else:
                email_to_id[email] = i

    # Group emails by root
    from collections import defaultdict
    root_to_emails = defaultdict(set)
    for email, id in email_to_id.items():
        root = uf.find(id)
        root_to_emails[root].add(email)

    # Build result
    result = []
    for root, emails in root_to_emails.items():
        name = accounts[root][0]
        result.append([name] + sorted(emails))

    return result

11) Union-Find vs DFS/BFS

Both can solve connectivity problems. When to use which?

Union-Find is better when:

Processing edges one at a time (online/streaming)
Need to check connectivity frequently
Building incrementally

DFS/BFS is better when:

Need to traverse paths
Want to find shortest path
Processing graph once

12) Common Variations

Union-Find with Size: Track component size instead of rank; useful for weighted quick-union.

type UnionFind struct {
    parent []int
    size   []int
}

func (uf *UnionFind) Union(x, y int) bool {
    rootX, rootY := uf.Find(x), uf.Find(y)
    if rootX == rootY {
        return false
    }
    // Attach smaller to larger
    if uf.size[rootX] < uf.size[rootY] {
        rootX, rootY = rootY, rootX
    }
    uf.parent[rootY] = rootX
    uf.size[rootX] += uf.size[rootY]
    return true
}

13) Common Mistakes

Forgetting path compression: Without it, trees can become O(n) tall
Not returning union status: Need to know if components were merged
1-indexed vs 0-indexed: Check problem constraints
Mapping strings to integers: For string-based problems, use a map

14) Complexity Summary

Operation	Naive	With Optimizations
Find	O(n)	O(α(n)) ≈ O(1)
Union	O(n)	O(α(n)) ≈ O(1)
n operations	O(n²)	O(n α(n)) ≈ O(n)

Outcomes

After this module, you should be able to:

Implement Union-Find with path compression and union by rank
Apply Union-Find to count connected components
Detect cycles in graphs using Union-Find
Merge groups based on shared elements (accounts merge pattern)

Practice Set

Number of Connected Components (Medium) — Count components using Union-Find
Redundant Connection (Medium) — Find cycle-creating edge
Accounts Merge (Medium) — Group by shared elements

Start with Connected Components to understand the basic structure. Redundant Connection shows how Union returning false indicates a cycle. Accounts Merge combines Union-Find with string mapping.

Union-Find (Disjoint Set Union)

Lesson

Union-Find (Disjoint Set Union)

Why this module exists

1) What is Union-Find?

2) Tree Representation

3) Basic Find

4) Path Compression

5) Basic Union

6) Union by Rank

7) Complexity with Optimizations

8) Application: Count Connected Components

9) Application: Redundant Connection

10) Application: Accounts Merge

11) Union-Find vs DFS/BFS

12) Common Variations

13) Common Mistakes

14) Complexity Summary

Outcomes

Practice Set

Module Items

Number of Connected Components

Redundant Connection

Accounts Merge

Union-Find Checkpoint