Union-Find (Disjoint Set Union)

1 / 12

What is Union-Find?

Tracks connected components with two operations:

  • Find(x): Which group is x in?
  • Union(x, y): Merge groups of x and y

Nearly O(1) per operation with optimizations

2 / 12

Tree Representation

Each element points to parent; root is representative

    1          3
   /          /
  2          4

Find(2) → 1 (root)
Find(4) → 3 (root)

Initially: each element is its own parent

3 / 12

Path Compression

While finding root, make nodes point directly to it

def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])
    return self.parent[x]

Flattens tree → future finds are O(1)

4 / 12

Union by Rank

Attach smaller tree under larger tree

if rank[root_x] < rank[root_y]:
    parent[root_x] = root_y
elif rank[root_x] > rank[root_y]:
    parent[root_y] = root_x
else:
    parent[root_y] = root_x
    rank[root_x] += 1

Keeps trees shallow

5 / 12

Complete Implementation

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False
        # ... rank comparison ...
        return True
6 / 12

Counting Components

def count_components(n, edges):
    uf = UnionFind(n)
    components = n

    for a, b in edges:
        if uf.union(a, b):
            components -= 1

    return components

Each successful union merges two components

7 / 12

Finding Redundant Edge

def find_redundant(edges):
    uf = UnionFind(len(edges) + 1)

    for a, b in edges:
        if not uf.union(a, b):
            return [a, b]  # Creates cycle!

Union returns False = already connected = cycle

8 / 12

Accounts Merge Pattern

  1. Map emails to account indices
  2. Union accounts sharing an email
  3. Group all emails by root account
  4. Sort and format output

Key: email → first account that had it

9 / 12

Complexity

OperationNaiveOptimized
FindO(n)O(α(n)) ≈ O(1)
UnionO(n)O(α(n)) ≈ O(1)

α(n) = inverse Ackermann ≈ constant

10 / 12

Union-Find vs BFS/DFS

Union-Find: streaming edges, frequent connectivity queries

BFS/DFS: need paths, single-pass traversal

11 / 12

Common Mistakes

  1. Forgetting path compression (slow)
  2. Not returning union success/fail
  3. Wrong indexing (0 vs 1 based)
  4. Not mapping strings to integers
12 / 12
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.