Maruf Notes: 2014

Tuesday, April 15, 2014

CS131: Quicksort and quickselect

Quick sort is a divide and conquer inplace sorting algorithm. This is a preferable sorting algorithm for many practical applications because of its average case performnace is very good- O(n log n).

Quick sort takes an array and partitions around a pivot item X such that elements left of X are smaller than X and elements right of X is greater than or equal to X. It takes O(n) to partition any array.

It then recursively sorts subarrays left of X and right of X recursively. If partitions are done in the middle on average total runtime will be O(n log n).

partition(items, left, right)
    X = items[left]
    I = left
    FOR J = left+1 TO right 
        IF items[J] <= X
            I = I + 1 
            swap(items[I], items[J])

    swap(items[left], items[I]

    RETURN I

quicksort(items, left, right)
    if left < right
        r = partition(items, left, right)
        quicksort(items, left, r-1)
        quicksort(items, r+1, right)

We call the quicksort function passing the array and entire range initially.

quicksort(items, 1, n);

Lets take an array of numbers

items = [27, 54, 43, 56, 9, 23, 4, 13, 17]

We start by calling

quicksort(items, 0, 8)

Lets execute the partition statement of quicksort function since left < right or (0 < 8) is TRUE

r = partition(items, 0, 8)

Initially left = 0 right = 8

X=27 I=1

FOR J = 1

item[1] = 54 (54 <= 27) is FALSE so we move to next iteration

FOR J = 2

item[2] = 43 (43 <= 27) is FALSE so we move to next iteration

FOR J = 3

item[2] = 56 (56 <= 27) is FALSE so we move to next iteration

FOR J = 4

item[4] = 9 (9 <= 27) is TRUE I = 0 + 1 = 1 We swap items[1] and items[4], so 9 and 54 swaps their places and items becomes

items = [27, 9, 43, 56, 54, 23, 4, 13, 17]

FOR J = 5

item[5] = 23 (23 <= 27) is TRUE I = 1 + 1 = 2 We swap items[2] and items[5], so 43 and 23 swaps their places and items becomes

items = [27, 9, 23, 56, 54, 43, 4, 13, 17]

FOR J = 6

item[6] = 4 (4 <= 27) is TRUE I = 2 + 1 = 3 We swap items[3] and items[6], so 4 and 56 swaps their places and items becomes

items = [27, 9, 23, 4, 54, 43, 56, 13, 17]

FOR J = 7

item[7] = 13 (13 <= 27) is TRUE I = 3 + 1 = 4 We swap items[4] and items[7], so 13 and 54 swaps their places and items becomes

items = [27, 9, 23, 4, 13, 43, 56, 54, 17]

FOR J = 8

item[7] = 17 (17 <= 27) is TRUE I = 4 + 1 = 5 We swap items[5] and items[8], so 17 and 43 swaps their places and items becomes

items = [27, 9, 23, 4, 13, 17, 56, 54, 43]

FOR loop is now complete (J is now equal to right).

We now swap items[left] = items[0] and items[5] so 17 and 27 swap their places and items becomes

items = [17, 9, 23, 4, 13, 27, 56, 54, 43]

No if you look at the items array any number that is smaller than 27 ia on the left of 27 and any number greater is on the right side of 27.

Now if we recursively partition left of 27 and right of 27 we well have our sorted items.

Here r = 5. So we call quicksort with

quicksort(items, 0, 5)
quicksort(items,6, 8)

We continue this and recursion of any branch stops when left < right is true. Then the subarray is completely sorted and there is nothing to do.

Quickselect

The quick select algorithm finds the k-th smallest number given an array of numbers containing items greater than or equal to k (you cn not find 6th smallest item from an array of 5 items).

This algorithms is very similar to the quicksort algorithms and uses the same partition function. When one partition is made and position of partition r is greater than k we partition left subarray (r+1, right) and if r is smaller than k we partition roght subarray (left, r-1).

quickselect(items, left, right, k)    
    if(left == right)
        return items[left]

    r = partition(items, left right)

    if r == k
        return items[k]
    if r > k
        return quickselect(items, left, r-1)
    else
        return quickselect(items, r+1, right, k)

Please not that the algorithm does not check any error condition.

Sunday, April 6, 2014

CS131: Graph

A graph G is a set of of vertices V and a set of edges E that connects the vertices.
G = (V,E)
This notation means V is a set of vertices and E is a set of edges. For example
V = (A,B,C,D) E =

Directed and undirected graph

Graph can be either directed or undirected. For directed graph or digraph edges has a direction.

Fig: Undirected Graph
Directed graph or digraph

Fig: Directed graph
Multiple copies of and edge is not allowed in a graph. Digraphs may have two edges in opposite direction.

Weighted graph

A graph G(V,E), either directed or undirected, where each edge u in E has an associated weight with it.
Fig: Weighted graph

Weights can be either positive or negative.

Terminology

Path

A sequence of vertices with where each two adjacent vertices are connected by an edge. For a directed graph each edges that connects the adjacend vertices must have a direction aligned to the path.
The number of edges in the path is the length of the path.

Strongly connected

If a graph has edges such that is is a path from any vertex to any other vertex then the graph is strongly connected graph.

Degree of vertex

Number of edges that connects a vertex to other verteices including itself (self edge) is the degree of theat vertex. Self edge is counted once.

Indegree

For digraph number of edges that are directed toward a vertex is the indegree of that vertex.

Outdeegree

For digraph number of edges that are directed away a vertex is the outdegree of that vertex.

Planner graph

A graph that can be drawn on a plane without crossing edges.

Sparse graph

If a graph has fewer number of edges than maximum number of edges possible is called sparse graph.

Graph representation

Graphs, either directed or undirected whether weighted or not, are generally represented in memory as one of two standard ways to represent a graph- using adjacency list or using adjacency matrix.
If a graph is sparse adjacency list is used to save memory. For a dense graph adjacency matrix will save memory and will be fast access representation.
To illustrate the two types of representation we'll use the following graphs.

Fig: Undirected graph

Fig: Directed graph

Adjacency list

Uses array of lists, one list per vertex, where each list contains the items that is adjacent to the vertex that the list represents.

[A|*]-->[B|*]-->[C|*]-->[D|$]
[B|*]-->[A|*]-->[E|$]
[C|*]-->[A|*]-->[D|*]-->[E|$]
[D|*]-->[A|*]-->[C|*]-->[F|$]
[E|*]-->[B|*]-->[C|*]-->F|$]
[F|*]-->[D|*]-->[E|$]

Fig: Adjacency list representation for undirected graph

[A|*]-->[A|*]-->[B|*]-->[D|$]
[B|*]-->[A|$]
[C|*]-->[A|*]-->[E|$]
[D|*]-->[C|*]-->[F|$]
[E|*]-->[B|$]
[F|*]-->[E|$]

Fig: Adjacency list representation for digraph

Adjacency matrix

To represent a graph G(V,E) a matrix of size VxV, requiring O(V^2) memory, is used. Each item a[i,j] in the matrix is 1 (or actual weight for the edge in case of weighted graph) if the edge (i,j) exists in E and NIL (or any convenient value like zero) otherwise.

    A B C D E F
   _           _
A | 0 1 1 1 0 0 |
B | 1 0 0 0 1 0 |
C | 1 0 0 1 1 0 |
D | 1 0 1 0 0 1 |
E | 0 1 1 0 0 1 |
F | 0 0 0 1 1 0 |
   -           -

Fig: Adjacency matrix representation for undirected graph

    A B C D E F
   _           _
A | 1 1 0 1 0 0 |
B | 1 0 0 0 0 0 |
C | 1 0 0 0 1 0 |
D | 0 0 1 0 0 1 |
E | 0 1 0 0 0 0 |
F | 0 0 0 0 1 0 |
   -           -

Fig: Adjacency matrix representation for digraph

Adjacency matrix are simpler to use and faster compared to adjacency list. If a graph is small we may prefer to use adjacency matrix.

Graph traversals

The process of visiting each vertex of the graph once is called graph traversal.

Depth-first search (DFS)

Visits deeper level nodes first before visiting any sibling of the current node.
DFS node visit order

Uses stack or recursive method to keep track of adjacent nodes that are not visited yet.

DFS(G, u)
  for each vertex u in G
    if u.visited != true
        v.visited = true
        distance = distance + 1
        u.distance = distance  
        for each vertex v in Adjacent(u)
            if v.visited = false
                v.prev = u
                DFS(G, v)

BFS

Visits all current level nodes first before visiting any deeper level nodes.
BFS node visit order

Uses a queue to keep track of adjacent nodes.
Finds the distances from source vertex or the root.

Topological Sort

A directed acyclic graph that defines the order of edges. In a graph G each directed edge (u,v) indicates that e mush come before v in the ordering in G. In G nodes may indicate ordered events. If G has a cycle topological sorting is not possible.
Implementation: a. Using DFS. O(V+E) b. Repeat: Find a vertex of indegree = 0 and output it- remove the vertex and all its outgoing edges Until: No edge exists. O(V+E)

Minimum Spanning Tree

Kruskal's Algorithm

Kruskal's algorithm first creates a new graph G' with all the nodes of graph G. The hew graph initially has no edge. Then it takes edges in increasing order and adds two trees together to form a new bigger tree. An edge is only selected if the two nodes it is connecting is not already part of a tree. The process continues until there is no edge available to connect two trees.
We can use a disjoint set determine if two nodes are in same set.

The algorithm

Take all nodes and create disjoint sets those initially contains one node.
Create a sorted list of all the edges by their weight in increasing order.

Selected Edges = empty
While the list is not empty
    Remove the smallest weighted edge E from the list.
    If the nodes N1 and N2 those are connected by E is not in same set
       Put E into Selected Edges
       Union the sets that contains N1 and N2

After the algorithm completes we have a set of edges in Selected Edges list that forms the minimum spanning three if the graph is connected or otherwise a minimum spanning forest.

CS131: 2-3-4 Tree [Draft]

2-3-4 Tree is a perfectly balanced tree (all leafs are on the same level) with O(log n) worst case time for find, insert and remove operations. As the name suggests every node of the tree has 2,3 or 4 children.

Operations

Find

Insert

Find a node with key k. If a node X with key k is found proceed to X's left until a leaf node is found. If a 3 key node is found we move the middle key up to its parent. Parent has at most two keys.

Fig:

Insertion may increase the depth of tree by one level.

Remove

Find the node X with key k to remove If X is a leaf, remove it ans return from function. If X is an internal node search for a leaf node Y with smallest key k' such that k" > k and replace Replace X with Y so that X is removed and Y takes its place.

If we find a one key node on the path of a leaf node we want to remove we continuously do the following 3 steps (except the root node):

If X is an one key node that is not root move one key from its adjacent sibling by rotation.

Fig:

If there is no adjacent sibling with more than one key take one key down from the parent. Internal nodes always have two or three keys. Moving down one key may trigger further operations if the parent is left with one key (verify).

Fig:

If parent is root and there is no adjacent sibling with more than one key (1 and 2 fails) create a 3 key node to form the root. This is called a fusion operation. The tree level is decreased by one level.

Fig:

Remove may decrease the tree by one level.

CS131: Splay Tree

Splay tree is a balanced binary search tree. On average all operations take O(log n) time.
Any sequence of k operations starting from an empty tree and never exceeding n items in tree will have O(k log n) worst case time.
The data structure ensures fast access to recently accessed data and much easier to program that other balanced trees like 2-3-4 tree.
Splay tree is not kept perfectly balanced all the time.

Tree rotation

Splay operation

We do splay operation whenever an item is accessed to move an item up through root. Repeatedly doing the operation moves the item up to the root of the tree.
We continuously rotate the tree so that it takes the position of the item above (parent, grand parent, great grand parent etc.) until it becomes the root.
Given an item X to splay up there are three possible situations:

a. X is a left child of a right child or right child of a left child
For left child of a right child we rotate right first and then rotate left to move the item two level up. For right child of a left child we rotate left first and then rotate right to move the item two level up.
b. X is a left child of a left child or right child of a right child
For left child of a left child we rotate right twice to move the item two level up. For right child of a right child we rotate left twice to move the item two level up.
c. X is child of root
We rotate once as appropriate (left or right) to move the item to root.
In all cases one rotation moves the item one level in upper direction.

Splay tree operations

Find operation

find(k)
    Search for the key k as like a binary search tree or reached a leaf node
    Let X be the last node during the search whether it contains k or not
    Splay the node X up so that it becomes root through a sequence of tree rotation

    Splay X up to root using a sequence of rotation.

Insert Operation

insert(k)
    Insert k like binary search tree
    Splay the new node up to root through tree rotation

Remove operation

remove(k)
    Remove the node as ordinary binary search tree
    Splay the nodes parent up to root using tree rotation

CS131: Binary Search Tree

A binary search tree is a binary tree such that for any node n, if a node x is in n's left subtree, x's key is smaller than n's key nad if x is on the right subtree of n, x's key is bigger than n's key.

Fig: Binary Search Tree

From above figure let us consider the node with key 15. Its left subtree has nodes with keys 10,5 and 12, all of which are less than 15. Its right subtree has keys 20,17, and 22 all of which are bigger than 15. Node with same key can be in any subtree.

Binary search tree search for minimum or maximum key is fast and we can do inexact match like what is the largest key smaller than 57 and what is the smallest key bigger than 57. Or we can find a word with close but not exact spelling. While searching down the tree for a key k, if that key is not present we enclunter the node with smallest key greater that k and the node with largest key smaller than k.

If we do an inorder traversal we traverse the nodes in sorted order of the keys.

Operations

Min()

Start with root node and go to left child until no longer possible and return the last node encountered.

Max()

Start with root node and go to right child until no longer possible and return the last node encountered.

struct node
{
    int key;
    node *left, *right;
};

node* search(node root, int k, node* &parent)
{
    cur_node = root;
    while(cur_node != NULL)
    {
        if(cur_node->key == k)
        {
            return cur_node;
        }

        parent = cur_node;

        if(cur_node->key > k)
        {
            cur_node = cur_node->right;
        }
        else
        {            
            cur_node = cur_node->left;
        }
    }
    return NULL;
}

Insert(k):

Remove(k)

Find the node n with key k.
If none found return.
If n has no child remove it.
If n has one child replace n with its child.
If n has two children
  - Find node d with smallest key from n's right subtree
  - Remove remove d from its location and place at n's location

Example

Fig: Binary search tree

Remove(15)

Node 15 has two children and smallest key on the right subtree is 17. We replace node 15 with node 17 and the tree becomes:

Fig: Binary search tree after removing 15

remove(node root, int k)
{
    node* parent;
    node* target_node = search(root, k, parent)

    if(node == NULL)
    {
        return;
    }

    if(node->right == NULL && node->left == NULL)
    {
        if(parent->left == target_node)
        {
            delete target_node;
            parent->left = NULL;
        }

        if(parent->right == target_node)
        {
            delete target_node;
            parent->right = NULL;
        }
    }

    //todo
}

Running time

Running time for operations on a binary search tree depends on the depth of the tree. If the tree is balanced the depth of the tree oi O(log n) where n is the number of nodes in the tree. In that case the complexity of the operations is O(log n).

So, if the tree has O(log n) depth all operations takes O(log n)

But it is possible that the tree is not balanced. If we inser items in a binary search tree in sorted order the binary search tree becomes a linked list which will have a running time of O(n) for all operations. This is worst case running time.

Worst case complexity O(n)

CS131: Tree

A tree is a collection of nodes and edges. Edges connects the edges together. In a tree there is one and only one path between any two nodes.

Fig: A tree - here we have A,B,C,D,E,F nodes and l,m,n,o,p, edges.

In a tree we define a path using connected sequence of edges. Number of edges on the path is called the length of the path. In above figure from B to F there is exactly one path and that is (m,n,p).

Rooted tree

In a tree we can pick a node and call it to be the root of the tree. We call this tree to be a rooted tree.

We choose C to be the root and we redraw the tree to keeping the root on top and others in different level based on distance from the root.

Fig1: Original tree Fig2: Redrawn tree

If we look carefully we see that nothing has changed for the tree but we have chosen C to be root and placed the nodes differently on paper. We draw the like figure 2 for convenience of understanding levels and relations.

If we choose any node and find a path from that node to the root the first node we encounter is called the parent of that node. All node except the root has exactly one parent. Root has no parent.

On the path fopm node F to root node C we encounter D first. So D is the parent of F and F is called to be child of D. A node may have any number of children including zero child. A node without any child is called a leaf node. A,F,E are leaf nodes.

All the nodes on the path to root including F itself and the root is called to be ancestor of F. So F's ancestors are F,D and C and F is called to be their descendents.

Length of the path from an node to the root is the depth of that node. Depth of F is 2. Length of path from a node to its deepest descendent is the height of the node. height of D is 1 nad height of root C is 2. Height of the root is also height of the tree. So the tree has a height of 2.

Nodes having same parent is called to be siblings. B and D are siblings. Similarly E and F are siblings.

Binary tree

A binary tree is a special type of three where no node is allowed to have more than two children.

Fig: Binary tree - a node may hav zero, one or two children.

Each child of a node is either left child or a right child. Even there is only one child that child must be either left child or a right chi

Representing rooted tree in memory

-----------------------
|  data   | Parent ref|
-----------------------
|Children list        |
-----------------------

Fig:

-----------------------------
|  data   |     Parent ref  |
-----------------------------
|First child | Next sibling |
-----------------------------

Fig:

Traversal

The process of visiting each node in a tree once is called traversal. Depending on the order of visiting nodes in a tree we ma traverse in a few different ways:

Pre-order traversal
Post-order traversal
Binary tree inorder traversal
Level order traversal

We will discuss them in details.

Preorder traversal

We visit each node starting from root and then recursively visit its children from left to right.

Preorder(node):
    Visit(node)
    for each child c of node
        Preorder(c)

Fig: Preorder traversal - numbers show the order of nodes getting visited

Postorder traversal

We visit each node's children left to right and then visit the node itself.

Preorder(node):    
    for each child c of node
        Preorder(c)
    Visit(node)

Fig: Postorder traversal

Binary tree inorder traversal

We visit left child then the node itself and then right chhild.

Inorder(node)
    if node.left exists
        Inorder(node.left)

    Visit(node)

    if node.right exists
        Inorder(node.right)

Fig: Inorder traversal

Level order traversal

We visit each level of the tree left to right before visiting any deeper level.

Fig: Level order traversal

We use a queue to keep track of the visited nodes children.

LevelOrder(tree)
   Q = new empty Queue
   Q.enque(tree.root)

   while Q is not empty
       node = Q.deque()
       Visit(node)
       for each child c of node from left to right
           Q.enqueue(c)

CS131: Disjoint Set

Disjoint set is a collection of sets that allows one key to be present only in one set.

All possible items that can be a member of a set is called "Universe of items".

Collection of disjoint sets are called partition.

Each set has a unique identifier that identifies the set.

Operations

We can perfoorm Union and Find operations on disjoint sets.

Union - Merges two sets into one set.

Find- It takes an item as parameter and returns its set.

Here is an example of a series of union and find operations:

[a], [b], [c], [d], [e], [f]

find(b) => b

union(a, b)
union(c,d)
union(e,f)

[a, b], [c, d], [e, f]

find(b) => a

union(a,c)

[a, b, c, d], [e, f]

find(d) => a

Disjoint sets can be implemented using list or using.

List based disjoint set and Quick find algorithm

Each set references list of items in the set and each item references the set that contains the item.

Fig: List based disjoint set and union operation

Running time

Find takes O(1) time but union is slow since it requires to reset set reference to all the items in one set which takes O(n) time.

Tree based disjoint set and Quick union algorithm

Each set is maintained as a tree. Therefore the data structure is a forest. Child or sibling reference is not maintained. So union operation can be performed in constant time by just setting one set's tree root to be parent of another set's tree root. The identity of the set is maintained at the root item. The root also maintains the size of the set to keep the depth of the tree lower while doing union operation.

Union operation

We make one set's root to be the child of other set's root.

If we have for items a, b,c and d each item is initially root of its own tree.

(a) (b) (c)  (d)

Union(a,b)

  (a)   (c)  (d)
 /
(b)

Union(a,c)

  (a)   (d)
 /   \
(b)  (c)

Union(a,d)

   (a)
 /  \ \
(b)(c)(d)

Union by size

Form above illustration note that (d) is made a child of (a). This is because (a) is a larger tree than (d) and this keeps the tree depth lower than if we make (a) to be child of (d). This way we get a tree with height n when we union two trees with at least (n-1) nodes each. We are able to double the number of nodes in the tree by increasing tree depth by one.

public class Node
{
    Node parent;
    int size;
}

public class DisjointSet
{
Dictionary<object, Node> set;

public void union(Node item1, Node item2)
{
    if(item1.size > item2.size)
    {
        item2.parent = item1;
        item1.size += item2.size;
    }
    else
    {
        item1.parent = item2;
        item2.size += item1.size;
    }
}
}

Find operation

For a given key we find the root of the tree which contains the key. We follow the parent reference until we reach the root node.

public Node find(Node node)
{
    if(node.parent == null)
    {
        return node;
    }
    else
    {
        Node parent_node = find(node.parent);
        node.parent = parent_node;
        return parent_node;
    }
}

Running time

Union is fast and takes O(1) time. Find is slower but it depends on the depth of the tree which grows slowly and is bound by the total number of unions. Also when we use path compression the node height is shorten on first find operation making consecutive find very fast. This way quick union algorithm based on a tree will be faster overall for any sequence of union and find operation.

CS131: Hash Tables

A hash table (also called map or dictionary) is an abstract collection of items where any data item can be stored and accessed with an associated key. Both data and key can be of any type.

The word hash is used as a term for transformation of a given key.

Direct address tables

If total number of possible items are reasonably small we may use direct address table where key is the index of the data array. Array is a direct address table.

Search()

Insert()

Delete()

Hash function

When direct address table is not a feasible option, for example when possible keys are too big, we use a function that takes the key k and return the index i of the table.

        i = h(k)   [ 0 <= i <= M - where M is total number of buckets in the table]

If the possible number of keys are bigger than available buckets in the hash table two keys may map the same index. This situation is called collision.

Let us consider the following keys

5, 13, 15, 17, 21, 24

We define the hash function as

    h(k) = k mod M

                  5  13
+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |
+---+---+---+---+---+---+---+

Fig: The mapping of the keys for M=7

Now for the first five keys (5, 13, 15, 17, 21) we can put the keys in the hash table buckets without any problem. But if we try to insert 24 we find 24 mod 7 = 3 and we have already 17 mapped to the 3rd bucket. We will look at a few techniques to solve the problem.

It is possible to create a very big hash table to minimize the number of collision. But we will be wasting a lot of memory if the table is mostly empty. So we want the average number of elements per bucket, which is called load factor, to be larger.

Load factor = n/M

The bigger the load factor the less wasted space.

Choosing a hash method

A good hash function should minimize collision, be easy to compute and distribute keys evenly through the available buckets.

The division method

h(k) = k mod M

For a good choice of M take a prime number that is distant from the values those are power of 2.

If M = 2^p the function will map two keys with same last character/digit to the same bucket [verify]

If M = 2^p-1 the function maps keys with same set of characters / digits to the same bucket. [verify]

The multiplication method

For all k h(k) = floor(M(kA-(floor(kA)) - where A is a constant with value range from 0 to 1. The value of M is not critical here.

Collision resolution by chaining

When two keys maps to same index we can use a linked list for that key to store multiple values. If there are M buckets we can maintain M linked list each of which may contain zero or more items.

Head(1) [*]-->[Value 1 | * ]-->[Value 2 | $ ]
Head(2) [*]-->[Value 3 | $ ]
Head(3) [*]-->[Value 5 | * ]-->[Value 6 | * ]-->[Value 7 | $ ]
Head(4) [$]
Head(5) [*]-->[Value 8 | $ ]

Fig: Using linked list for collision resolution

I <-- H(k) +1   [1 <= I <= M]
If I is present in HashTable 
    Do
        If k = KEY[I] return I        
        I = Link[I]
    While (I != 0)

Open addressing with Linear probing and insertion

Open addressing with Double hashing

Deletion

We can not simply delete a key from hash table since there could be more than one keys that hashed to the same bucket.

Deletion with linear probing

Universal hashing

For any hash function it is possible that someone will be able to come up with a set of keys such that every key maps to the same bucket or a set of very small number of buckets making the hash table a linked list and accessing any item will take O(n) time instead of expected O(1) time.

To solve this problem we can define a set of different hash functions and pick one function randomly when we initialize the hash table before first use. This technique is called universal hashing and operation on any element is expected to take O(1) average time.

Once chosen the hash function is not changed for the lifetime of the hash table so that each key hashed to the same bucket.

Linear Hashing

The hash table grows or shrinks as items are inserted or deleted from the hash table. It is not related to linear probing.

If N is initially chooses as number of buckets the number of total bucket of the hash table is chosen as 2N, 4N, 8N, 16N etc. that is, power of 2 * N. The power used is called level. So total number of bucket is 2^level * N. Level 2 has 4N buckets.

Linear hashing is implemented using dynamic array, a variable size array that allows random access of keys. When the size of array is changed

Extendible Hashing

Growing the size of hash table requires rehashing the keys and could be big performance hit when it is done. To avoid this situation, extendible hashing technique uses a trie as a hierarchical storage so that the rehashing can be done incrementally, one bucket at a time, as the hash table grows.

CS131: Prioryty Queue

A priority queue is a data structure that stores a set of entries where each entry has an associated key and total order of the keys are maintained.

Prioroty key can be either min priority queue which maintains keys in increasing order or max priority queue which maintains keys in decreasing order. In this chapter we will use min priority queue.

Priority queue can identify and remove the smallest key very fast and an item can be inserted at any time.

Operations:

Priority queue supports following operations:

Insert - which inserts item into the priority queue Min - which returns the item with smallest key RemoveMin - which removes the item with smallest key and returns it

Here is a few example of the operations:

[ | | | | | ]

Insert(4)

[4| | | | | ]

Insert(8)

[4|8| | | | ]

Insert(2)

[2|4|8| | | ]

Min() -> returns 2 and the priority queue is unchanged

[2|4|8| | | ]

RemoveMin() -> returns 2 and remove it from the priority queue

[4|8| | | | ]

Priority tree can be implemented using a Binary Heap.

Complete Binary Tree A binary tree in which every level of the three is full except the bottom row- which is filled from left to right. A complete binary tree has 1+log(n) levels where n is the number of items.

Fig: Complete Binary Tree

Binary heap

A binary heap is a complete binary tree where entries must satisfy heap order property, that is, no child element has a key less than its parents key. Multiple copies of same keys are allowed.

                (1)
             /       \
         /               \
      (3)                 (4) 
     /   \               /   \
   /       \           /       \
  (9)       (11)      (12)      (17)
 /   \     /   \     /
(13) (14) (15) (16) (12)

          Fig: Binary heap

Storing binary heap in a array

We do a level order traversal and on each level we traverse from left to right and store the keys sequentially in an array. If we do this for the binary heap from above figure we get following array.

[x|1|3|4|9|11|12|17|13|14|15|16|12|---]

Notice that we have kept the first cell unused.

In the array any item i's children is located at 2i and 2i+1 indexes and item i's parent is located at index |i/2|

For example item at index 4 is 9. Its children are 8th ad 9th item and the keys are 13 and 14. Its parent is at 2nd index which is 3. From the above binary heap tree figure you may verify that these are correct keys.

Operations

Min():
Return the entry at the root. In the array the root is stored at index 1. 
This is a constant time operation.

Insert(k):
1. Place k at the first open position on the bottom level from left. In the array 
   put k at the first empty space at the end of existing item.
2. Bubble up the element untill the heap order property is satisfied.
        If the parent key is bigger than the items key swap the item with its 
        parent and repeat this process as long as parent's key is bigger or the item is at the root.

Fig: Insert example

RemoveMin():
    1. Remove the entry at root
    2. Take the last entry from bottom level of  the tree feom left and put it at root 
       position. In array this is the last item.
    3. Bubble the root entry down through the heap-
         until the items key is smaller than both of its children's key swap with smaller child

Fig: Remove example

Worst case performsnce Theta(log n) Best case performance Theta(1)

Bottom up heap construction

If we are given an array of items and we need to create a heap, we can insert each item in the tree one by one which will take Theta(n log n) time or we can use bottom up heap construction technique to do it in Theta(n)

BottomUpHesp(itemArray): [ref:]
1. Make a complete binary tree without considering order. For a given array of 
items no operation is required. 
2. For each nonleaf node starting from last one bubble the item 
down (swap with smaller child) until its key both of its children's key.

Fig: Bottom up heap construction example

CS131: Queue

Que is a linear list data structure. The insert and remove operations are performed at the opposite ends of the list which are called the rear and the front respectively.

enqueue   ---->   |  3 |  4 |  2 |  <------ dequeue
              rear              front

Fig: Queue showing rear and front

A que can be implemented using a circular array or a linked list that maintains the rear and front item reference.

enqueue   -----+                       +------ dequeue
               |                       |    
               v  rear           front v
     +---->  |  |  3 |  4 |  2 | 7 | 10 |  |  | >-----+
     |                                                |
     +------------------------------------------------+

Fig: Circular list implementation using array

enqueue   -----+                       +------ dequeue
               |                       |    
               v  rear           front v
   [head]---> [3|*]-->[4|*]-->[2|*]-->[9|*]--->$

Fig: Linked list implementation

Operations

    public interface Queue
    {
        /// <summary>
        /// Inserts an item to the rear of the list of items.
        /// </summary>
        /// <param name="item">item to be inserted</param>
        void enqueue(object item);

        /// <summary>
        /// Removes an item from the front of the list of items and returns it.
        /// </summary>
        /// <returns>The removed item</returns>
        object dequeue();

        /// <summary>
        /// Returns the item in front of the list of items. The list is not altered in any way.
        /// </summary>
        /// <returns>The item at the front of the queue</returns>
        object front();

        /// <summary>
        /// Check if the queue is empty or not
        /// </summary>
        /// <returns>true if queue is empty, false otherwise</returns>
        bool empty();

        /// <summary>
        /// Checks if the queue is full or not
        /// </summary>
        /// <returns>true if queue is full, false otherwise</returns>
        bool full();

        /// <summary>
        /// Calculates the number of items in the queue
        /// </summary>
        /// <returns>Number of items in teh queue</returns>
        int size();
    }

Implementation

    public class Node
    {
        public Node next, previous;
        public object data;
    }


    public class LinkedListQueue : Queue
    {
        Node rear_item, front_item;
        int item_count;

        public LinkedListQueue()
        {
            rear_item = front_item = null;
            item_count = 0;
        }

        public void enqueue(object data)
        {
            SNode new_item = new SNode();
            new_item.data = data;
            new_item.next = null;
            new_item.previous = front_item;

            if (front_item != null)
            {
                front_item.next = new_item;
            }

            front_item = new_item;

            if (item_count == 0)
            {
                rear_item = new_item;
            }

            item_count++;
        }

        public object dequeue()
        {
            if (front_item == null)
            {
                return null;
            }

            object data = front_item.data;

            item_count--;

            if (item_count == 0)
            {
                rear_item = null;
                front_item = null;
            }
            else
            {
                front_item = front_item.previous;
                front_item.next = null;
            }

            return data;
        }

        public object front()
        {
            if (front_item == null)
            {
                return null;
            }

            return front_item.data;
        }

        public bool empty()
        {
            return (item_count == 0);
        }

        public bool full()
        {
            return false;
        }

        public int size()
        {
            return item_count;
        }

        public static void Test()
        {
            Queue queue = new LinkedListQueue();
            object data = 10;
            queue.enqueue(data);
            object item = queue.dequeue();
            TestEngine.AssertEquals(item, data, "Dequeue does not return expected data");
        }
    }

Deque

A double ended queue where insertion and deletion are allowed on both end of the list.

Problem: Implement a deque