Sunday, April 6, 2014

CS131: Disjoint Set

Disjoint set is a collection of sets that allows one key to be present only in one set.

All possible items that can be a member of a set is called "Universe of items".

Collection of disjoint sets are called partition.

Each set has a unique identifier that identifies the set.

Operations

We can perfoorm Union and Find operations on disjoint sets.

Union - Merges two sets into one set.

Find- It takes an item as parameter and returns its set.

Here is an example of a series of union and find operations:

[a], [b], [c], [d], [e], [f]

find(b) => b

union(a, b)
union(c,d)
union(e,f)

[a, b], [c, d], [e, f]

find(b) => a

union(a,c)

[a, b, c, d], [e, f]

find(d) => a

Disjoint sets can be implemented using list or using.

List based disjoint set and Quick find algorithm

Each set references list of items in the set and each item references the set that contains the item.

Fig: List based disjoint set and union operation

Running time

Find takes O(1) time but union is slow since it requires to reset set reference to all the items in one set which takes O(n) time.

Tree based disjoint set and Quick union algorithm

Each set is maintained as a tree. Therefore the data structure is a forest. Child or sibling reference is not maintained. So union operation can be performed in constant time by just setting one set's tree root to be parent of another set's tree root. The identity of the set is maintained at the root item. The root also maintains the size of the set to keep the depth of the tree lower while doing union operation.

Union operation

We make one set's root to be the child of other set's root.

If we have for items a, b,c and d each item is initially root of its own tree.

(a) (b) (c)  (d)

Union(a,b)

  (a)   (c)  (d)
 /
(b)

Union(a,c)

  (a)   (d)
 /   \
(b)  (c)

Union(a,d)

   (a)
 /  \ \
(b)(c)(d)

Union by size

Form above illustration note that (d) is made a child of (a). This is because (a) is a larger tree than (d) and this keeps the tree depth lower than if we make (a) to be child of (d). This way we get a tree with height n when we union two trees with at least (n-1) nodes each. We are able to double the number of nodes in the tree by increasing tree depth by one.

public class Node
{
    Node parent;
    int size;
}

public class DisjointSet
{
Dictionary<object, Node> set;

public void union(Node item1, Node item2)
{
    if(item1.size > item2.size)
    {
        item2.parent = item1;
        item1.size += item2.size;
    }
    else
    {
        item1.parent = item2;
        item2.size += item1.size;
    }
}
}

Find operation

For a given key we find the root of the tree which contains the key. We follow the parent reference until we reach the root node.

public Node find(Node node)
{
    if(node.parent == null)
    {
        return node;
    }
    else
    {
        Node parent_node = find(node.parent);
        node.parent = parent_node;
        return parent_node;
    }
}

Running time

Union is fast and takes O(1) time. Find is slower but it depends on the depth of the tree which grows slowly and is bound by the total number of unions. Also when we use path compression the node height is shorten on first find operation making consecutive find very fast. This way quick union algorithm based on a tree will be faster overall for any sequence of union and find operation.