并查集

引入¶

并查集是一种用于管理元素所属集合的数据结构，实现为一个森林，其中每棵树表示一个集合，树中的节点表示对应集合中的元素。

顾名思义，并查集支持两种操作：

合并（Union）：合并两个元素所属集合（合并对应的树）
查询（Find）：查询某个元素所属集合（查询对应的树的根节点），这可以用于判断两个元素是否属于同一集合

并查集在经过修改后可以支持单个元素的删除、移动；使用动态开点线段树还可以实现可持久化并查集。

Warning

并查集无法以较低复杂度实现集合的分离。

初始化¶

初始时，每个元素都位于一个单独的集合，表示为一棵只有根节点的树。方便起见，我们将根节点的父亲设为自己。

实现

C++Python

struct dsu {
  vector<size_t> pa;

  explicit dsu(size_t size) : pa(size) { iota(pa.begin(), pa.end(), 0); }
};

class Dsu:
    def __init__(self, size):
        self.pa = list(range(size))

查询¶

我们需要沿着树向上移动，直至找到根节点。

实现

C++Python

size_t dsu::find(size_t x) { return pa[x] == x ? x : find(pa[x]); }

def find(self, x):
    return x if self.pa[x] == x else self.find(self.pa[x])

路径压缩¶

查询过程中经过的每个元素都属于该集合，我们可以将其直接连到根节点以加快后续查询。

实现

C++Python

size_t dsu::find(size_t x) { return pa[x] == x ? x : pa[x] = find(pa[x]); }

def find(self, x):
    if self.pa[x] != x:
        self.pa[x] = self.find(self.pa[x])
    return self.pa[x]

合并¶

要合并两棵树，我们只需要将一棵树的根节点连到另一棵树的根节点。

实现

C++Python

void dsu::unite(size_t x, size_t y) { pa[find(x)] = find(y); }

def union(self, x, y):
    self.pa[self.find(x)] = self.find(y)

启发式合并¶

合并时，选择哪棵树的根节点作为新树的根节点会影响未来操作的复杂度。我们可以将节点较少或深度较小的树连到另一棵，以免发生退化。

具体复杂度讨论

由于需要我们支持的只有集合的合并、查询操作，当我们需要将两个集合合二为一时，无论将哪一个集合连接到另一个集合的下面，都能得到正确的结果。但不同的连接方法存在时间复杂度的差异。具体来说，如果我们将一棵点数与深度都较小的集合树连接到一棵更大的集合树下，显然相比于另一种连接方案，接下来执行查找操作的用时更小（也会带来更优的最坏时间复杂度）。

当然，我们不总能遇到恰好如上所述的集合——点数与深度都更小。鉴于点数与深度这两个特征都很容易维护，我们常常从中择一，作为估价函数。而无论选择哪一个，时间复杂度都为 \(O (m\alpha(m,n))\)，具体的证明可参见 References 中引用的论文。

在算法竞赛的实际代码中，即便不使用启发式合并，代码也往往能够在规定时间内完成任务。在 Tarjan 的论文 [1] 中，证明了不使用启发式合并、只使用路径压缩的最坏时间复杂度是 \(O (m \log n)\)。在姚期智的论文 [2] 中，证明了不使用启发式合并、只使用路径压缩，在平均情况下，时间复杂度依然是 \(O (m\alpha(m,n))\)。

如果只使用启发式合并，而不使用路径压缩，时间复杂度为 \(O(m\log n)\)。由于路径压缩单次合并可能造成大量修改，有时路径压缩并不适合使用。例如，在可持久化并查集、线段树分治 + 并查集中，一般使用只启发式合并的并查集。

按节点数合并的参考实现：

实现

C++Python

struct dsu {
  vector<size_t> pa, size;

  explicit dsu(size_t size_) : pa(size_), size(size_, 1) {
    iota(pa.begin(), pa.end(), 0);
  }

  void unite(size_t x, size_t y) {
    x = find(x), y = find(y);
    if (x == y) return;
    if (size[x] < size[y]) swap(x, y);
    pa[y] = x;
    size[x] += size[y];
  }
};

class Dsu:
    def __init__(self, size):
        self.pa = list(range(size))
        self.size = [1] * size

    def union(self, x, y):
        x, y = self.find(x), self.find(y)
        if x == y:
            return
        if self.size[x] < self.size[y]:
            x, y = y, x
        self.pa[y] = x
        self.size[x] += self.size[y]

删除¶

要删除一个叶子节点，我们可以将其父亲设为自己。为了保证要删除的元素都是叶子，我们可以预先为每个节点制作副本，并将其副本作为父亲。

实现

C++Python

struct dsu {
  vector<size_t> pa, size;

  explicit dsu(size_t size_) : pa(size_ * 2), size(size_ * 2, 1) {
    iota(pa.begin(), pa.begin() + size_, size_);
    iota(pa.begin() + size_, pa.end(), size_);
  }

  void erase(size_t x) {
    --size[find(x)];
    pa[x] = x;
  }
};

class Dsu:
    def __init__(self, size):
        self.pa = list(range(size, size * 2)) * 2
        self.size = [1] * size * 2

    def erase(self, x):
        self.size[self.find(x)] -= 1
        self.pa[x] = x

移动¶

与删除类似，通过以副本作为父亲，保证要移动的元素都是叶子。

实现

C++Python

void dsu::move(size_t x, size_t y) {
  auto fx = find(x), fy = find(y);
  if (fx == fy) return;
  pa[x] = fy;
  --size[fx], ++size[fy];
}

def move(self, x, y):
    fx, fy = self.find(x), self.find(y)
    if fx == fy:
        return
    self.pa[x] = fy
    self.size[fx] -= 1
    self.size[fy] += 1

复杂度¶

时间复杂度¶

同时使用路径压缩和启发式合并之后，并查集的每个操作平均时间仅为 \(O(\alpha(n))\)，其中 \(\alpha\) 为阿克曼函数的反函数，其增长极其缓慢，也就是说其单次操作的平均运行时间可以认为是一个很小的常数。

Ackermann 函数 \(A(m, n)\) 的定义是这样的：

\(A(m, n) = \begin{cases}n+1&\text{if }m=0\\A(m-1,1)&\text{if }m>0\text{ and }n=0\\A(m-1,A(m,n-1))&\text{otherwise}\end{cases}\)

而反 Ackermann 函数 \(\alpha(n)\) 的定义是阿克曼函数的反函数，即为最大的整数 \(m\) 使得 \(A(m, m) \leqslant n\)。

时间复杂度的证明在这个页面中。

空间复杂度¶

显然为 \(O(n)\)。

带权并查集¶

我们还可以在并查集的边上定义某种权值、以及这种权值在路径压缩时产生的运算，从而解决更多的问题。比如对于经典的「NOI2001」食物链，我们可以在边权上维护模 3 意义下的加法群。

例题¶

UVA11987 Almost Union-Find

实现类似并查集的数据结构，支持以下操作：

合并两个元素所属集合
移动单个元素
查询某个元素所属集合的大小及元素和

参考代码

C++Python

#include <bits/stdc++.h>

using namespace std;

struct dsu {
  vector<size_t> pa, size, sum;

  explicit dsu(size_t size_)
      : pa(size_ * 2), size(size_ * 2, 1), sum(size_ * 2) {
    // size 与 sum 的前半段其实没有使用，只是为了让下标计算更简单
    iota(pa.begin(), pa.begin() + size_, size_);
    iota(pa.begin() + size_, pa.end(), size_);
    iota(sum.begin() + size_, sum.end(), 0);
  }

  void unite(size_t x, size_t y) {
    x = find(x), y = find(y);
    if (x == y) return;
    if (size[x] < size[y]) swap(x, y);
    pa[y] = x;
    size[x] += size[y];
    sum[x] += sum[y];
  }

  void move(size_t x, size_t y) {
    auto fx = find(x), fy = find(y);
    if (fx == fy) return;
    pa[x] = fy;
    --size[fx], ++size[fy];
    sum[fx] -= x, sum[fy] += x;
  }

  size_t find(size_t x) { return pa[x] == x ? x : pa[x] = find(pa[x]); }
};

int main() {
  size_t n, m, op, x, y;
  while (cin >> n >> m) {
    dsu dsu(n + 1);  // 元素范围是 1..n
    while (m--) {
      cin >> op;
      switch (op) {
        case 1:
          cin >> x >> y;
          dsu.unite(x, y);
          break;
        case 2:
          cin >> x >> y;
          dsu.move(x, y);
          break;
        case 3:
          cin >> x;
          x = dsu.find(x);
          cout << dsu.size[x] << ' ' << dsu.sum[x] << '\n';
          break;
        default:
          assert(false);  // not reachable
      }
    }
  }
}

class Dsu:
    def __init__(self, size):
        # size 与 sum 的前半段其实没有使用，只是为了让下标计算更简单
        self.pa = list(range(size, size * 2)) * 2
        self.size = [1] * size * 2
        self.sum = list(range(size)) * 2

    def union(self, x, y):
        x, y = self.find(x), self.find(y)
        if x == y:
            return
        if self.size[x] < self.size[y]:
            x, y = y, x
        self.pa[y] = x
        self.size[x] += self.size[y]
        self.sum[x] += self.sum[y]

    def move(self, x, y):
        fx, fy = self.find(x), self.find(y)
        if fx == fy:
            return
        self.pa[x] = fy
        self.size[fx] -= 1
        self.size[fy] += 1
        self.sum[fx] -= x
        self.sum[fy] += x

    def find(self, x):
        if self.pa[x] != x:
            self.pa[x] = self.find(self.pa[x])
        return self.pa[x]


if __name__ == "__main__":
    while True:
        try:
            n, m = map(int, input().split())
            dsu = Dsu(n + 1)  # 元素范围是 1..n
            for _ in range(m):
                op_x_y = list(map(int, input().split()))
                op = op_x_y[0]
                if op == 1:
                    dsu.union(op_x_y[1], op_x_y[2])
                elif op == 2:
                    dsu.move(op_x_y[1], op_x_y[2])
                elif op == 3:
                    x = dsu.find(op_x_y[1])
                    print(dsu.size[x], dsu.sum[x])
        except EOFError:
            break

习题¶

其他应用¶

最小生成树算法中的 Kruskal 和最近公共祖先中的 Tarjan 算法是基于并查集的算法。

参考资料与拓展阅读¶

Tarjan, R. E., & Van Leeuwen, J. (1984). Worst-case analysis of set union algorithms. Journal of the ACM (JACM), 31(2), 245-281.ResearchGate PDF
Yao, A. C. (1985). On the expected performance of path compression algorithms.SIAM Journal on Computing, 14(1), 129-133.
知乎回答：是否在并查集中真的有二分路径压缩优化？
Gabow, H. N., & Tarjan, R. E. (1985). A Linear-Time Algorithm for a Special Case of Disjoint Set Union. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 30, 209-221.PDF