Saturday, July 30, 2016

DP - Longest increasing subsequence

The problem

Given a sequence of numbers S1,S2...,Sn
We have to find a sub-sequence


such that


The solution

First we create a table by scanning the number from start to end. For each number we find number with largest sequence that is smaller than the current number. We set length of current items sequence length to be one more the found number. We also set the predecessor of the current item to be the found numbers index.
From the table we see that the maximum length is 7 at index 10. So we start at index 10 which will be last item of the sub-sequence. The value is 16. Then we find its predecessor from the predecessor list at index 10 - which is 9. So the next to last item will be item at index 9 which is 12. Which has predecessor at index 7 and so on.
The longest increasing sub-sequence is:
1, 3, 4, 5, 7, 12, 16

Here is the code in python:


[This is incomplete but I may not complete it]

Sunday, November 1, 2015

Basic data structures and algorithms

Table of contents

Basic data structures

  • Stack
  • Queue
  • Linked List
  • Priority Queue
  • Hash tables
  • Disjoint set


  • Tree data structure
  • Tree traversal
  • Binary Search Tree
  • Splay Tree
  • 2-3-4 Tree
  • B+ Tree
  • Red Black Tree
  • Trie


  • Types of graph
  • Graph data structures
  • Traversing a graph
  • Breadth first search
  • Deapth first search

Graph algorithms

  • Topological sorting
  • Minimum spanning tree
  • Kruskal's algorithm
  • Prims's algorithm
  • Shortest Path
  • Bellman Ford Algorithm
  • Dijkstras Algorithm
  • A* Search
  • Floyd Warshall Algorithm

Types of algorithms

  • Greedy Method
  • Divide and conquer
  • Dynamic Programming
  • Reduction


  • Counting sort
  • Insertion sort
  • Shell sort
  • Bubble sort
  • Quicksort and quick select
  • Selection sort
  • Heapsort
  • Merge sort
  • Bucket sort or radix sort - Sorting by distribution
  • Lower bound of comparison based search

Flow networks

  • Ford-Fulkerson method
  • Maximum bipartite matching

Basic Cryptography

  • Symmetric key cryptography
  • Pyblic key cryptography
  • RSA Cryptosystem

Data compression

  • Huffman codes
  • LZW compression

NP Completeness

Tuesday, April 15, 2014

CS131: Quicksort and quickselect

Quick sort is a divide and conquer inplace sorting algorithm. This is a preferable sorting algorithm for many practical applications because of its average case performnace is very good- O(n log n).

Quick sort takes an array and partitions around a pivot item X such that elements left of X are smaller than X and elements right of X is greater than or equal to X. It takes O(n) to partition any array.

It then recursively sorts subarrays left of X and right of X recursively. If partitions are done in the middle on average total runtime will be O(n log n).

partition(items, left, right)
    X = items[left]
    I = left
    FOR J = left+1 TO right 
        IF items[J] <= X
            I = I + 1 
            swap(items[I], items[J])

    swap(items[left], items[I]

quicksort(items, left, right)
    if left < right
        r = partition(items, left, right)
        quicksort(items, left, r-1)
        quicksort(items, r+1, right)

We call the quicksort function passing the array and entire range initially.

quicksort(items, 1, n);

Lets take an array of numbers

items = [27, 54, 43, 56, 9, 23, 4, 13, 17]

We start by calling

quicksort(items, 0, 8)

Lets execute the partition statement of quicksort function since left < right or (0 < 8) is TRUE

r = partition(items, 0, 8)

Initially left = 0 right = 8

X=27 I=1

FOR J = 1

item[1] = 54 (54 <= 27) is FALSE so we move to next iteration

FOR J = 2

item[2] = 43 (43 <= 27) is FALSE so we move to next iteration

FOR J = 3

item[2] = 56 (56 <= 27) is FALSE so we move to next iteration

FOR J = 4

item[4] = 9 (9 <= 27) is TRUE I = 0 + 1 = 1 We swap items[1] and items[4], so 9 and 54 swaps their places and items becomes

items = [27, 9, 43, 56, 54, 23, 4, 13, 17]

FOR J = 5

item[5] = 23 (23 <= 27) is TRUE I = 1 + 1 = 2 We swap items[2] and items[5], so 43 and 23 swaps their places and items becomes

items = [27, 9, 23, 56, 54, 43, 4, 13, 17]

FOR J = 6

item[6] = 4 (4 <= 27) is TRUE I = 2 + 1 = 3 We swap items[3] and items[6], so 4 and 56 swaps their places and items becomes

items = [27, 9, 23, 4, 54, 43, 56, 13, 17]

FOR J = 7

item[7] = 13 (13 <= 27) is TRUE I = 3 + 1 = 4 We swap items[4] and items[7], so 13 and 54 swaps their places and items becomes

items = [27, 9, 23, 4, 13, 43, 56, 54, 17]

FOR J = 8

item[7] = 17 (17 <= 27) is TRUE I = 4 + 1 = 5 We swap items[5] and items[8], so 17 and 43 swaps their places and items becomes

items = [27, 9, 23, 4, 13, 17, 56, 54, 43]

FOR loop is now complete (J is now equal to right).

We now swap items[left] = items[0] and items[5] so 17 and 27 swap their places and items becomes

items = [17, 9, 23, 4, 13, 27, 56, 54, 43]

No if you look at the items array any number that is smaller than 27 ia on the left of 27 and any number greater is on the right side of 27.

Now if we recursively partition left of 27 and right of 27 we well have our sorted items.

Here r = 5. So we call quicksort with

quicksort(items, 0, 5)
quicksort(items,6, 8)

We continue this and recursion of any branch stops when left < right is true. Then the subarray is completely sorted and there is nothing to do.


The quick select algorithm finds the k-th smallest number given an array of numbers containing items greater than or equal to k (you cn not find 6th smallest item from an array of 5 items).

This algorithms is very similar to the quicksort algorithms and uses the same partition function. When one partition is made and position of partition r is greater than k we partition left subarray (r+1, right) and if r is smaller than k we partition roght subarray (left, r-1).

quickselect(items, left, right, k)    
    if(left == right)
        return items[left]

    r = partition(items, left right)

    if r == k
        return items[k]
    if r > k
        return quickselect(items, left, r-1)
        return quickselect(items, r+1, right, k)

Please not that the algorithm does not check any error condition.

Sunday, April 6, 2014

CS131: Graph

A graph G is a set of of vertices V and a set of edges E that connects the vertices.
G = (V,E)
This notation means V is a set of vertices and E is a set of edges. For example
V = (A,B,C,D) E =

Directed and undirected graph

Graph can be either directed or undirected. For directed graph or digraph edges has a direction.
Undirected graph
Fig: Undirected Graph
Directed graph or digraph
Fig: Directed graph
Multiple copies of and edge is not allowed in a graph. Digraphs may have two edges in opposite direction.

Weighted graph

A graph G(V,E), either directed or undirected, where each edge u in E has an associated weight with it.
Fig: Weighted graph
Weights can be either positive or negative.



A sequence of vertices with where each two adjacent vertices are connected by an edge. For a directed graph each edges that connects the adjacend vertices must have a direction aligned to the path.
The number of edges in the path is the length of the path.

Strongly connected

If a graph has edges such that is is a path from any vertex to any other vertex then the graph is strongly connected graph.

Degree of vertex

Number of edges that connects a vertex to other verteices including itself (self edge) is the degree of theat vertex. Self edge is counted once.


For digraph number of edges that are directed toward a vertex is the indegree of that vertex.


For digraph number of edges that are directed away a vertex is the outdegree of that vertex.

Planner graph

A graph that can be drawn on a plane without crossing edges.

Sparse graph

If a graph has fewer number of edges than maximum number of edges possible is called sparse graph.

Graph representation

Graphs, either directed or undirected whether weighted or not, are generally represented in memory as one of two standard ways to represent a graph- using adjacency list or using adjacency matrix.
If a graph is sparse adjacency list is used to save memory. For a dense graph adjacency matrix will save memory and will be fast access representation.
To illustrate the two types of representation we'll use the following graphs.
Fig: Undirected graph
Fig: Undirected graph
Fig: Directed graph
Fig: Directed graph

Adjacency list

Uses array of lists, one list per vertex, where each list contains the items that is adjacent to the vertex that the list represents.

Fig: Adjacency list representation for undirected graph

Fig: Adjacency list representation for digraph

Adjacency matrix

To represent a graph G(V,E) a matrix of size VxV, requiring O(V^2) memory, is used. Each item a[i,j] in the matrix is 1 (or actual weight for the edge in case of weighted graph) if the edge (i,j) exists in E and NIL (or any convenient value like zero) otherwise.
    A B C D E F
   _           _
A | 0 1 1 1 0 0 |
B | 1 0 0 0 1 0 |
C | 1 0 0 1 1 0 |
D | 1 0 1 0 0 1 |
E | 0 1 1 0 0 1 |
F | 0 0 0 1 1 0 |
   -           -

Fig: Adjacency matrix representation for undirected graph
    A B C D E F
   _           _
A | 1 1 0 1 0 0 |
B | 1 0 0 0 0 0 |
C | 1 0 0 0 1 0 |
D | 0 0 1 0 0 1 |
E | 0 1 0 0 0 0 |
F | 0 0 0 0 1 0 |
   -           -

Fig: Adjacency matrix representation for digraph
Adjacency matrix are simpler to use and faster compared to adjacency list. If a graph is small we may prefer to use adjacency matrix.

Graph traversals

The process of visiting each vertex of the graph once is called graph traversal.

Depth-first search (DFS)

Visits deeper level nodes first before visiting any sibling of the current node.
DFS node visit order
Uses stack or recursive method to keep track of adjacent nodes that are not visited yet.
DFS(G, u)
  for each vertex u in G
    if u.visited != true
        v.visited = true
        distance = distance + 1
        u.distance = distance  
        for each vertex v in Adjacent(u)
            if v.visited = false
                v.prev = u
                DFS(G, v)


Visits all current level nodes first before visiting any deeper level nodes.
BFS node visit order
Uses a queue to keep track of adjacent nodes.
Finds the distances from source vertex or the root.

Topological Sort

A directed acyclic graph that defines the order of edges. In a graph G each directed edge (u,v) indicates that e mush come before v in the ordering in G. In G nodes may indicate ordered events. If G has a cycle topological sorting is not possible.
Implementation: a. Using DFS. O(V+E) b. Repeat: Find a vertex of indegree = 0 and output it- remove the vertex and all its outgoing edges Until: No edge exists. O(V+E)

Minimum Spanning Tree

Kruskal's Algorithm

Kruskal's algorithm first creates a new graph G' with all the nodes of graph G. The hew graph initially has no edge. Then it takes edges in increasing order and adds two trees together to form a new bigger tree. An edge is only selected if the two nodes it is connecting is not already part of a tree. The process continues until there is no edge available to connect two trees.
We can use a disjoint set determine if two nodes are in same set.

The algorithm

Take all nodes and create disjoint sets those initially contains one node.
Create a sorted list of all the edges by their weight in increasing order.

Selected Edges = empty
While the list is not empty
    Remove the smallest weighted edge E from the list.
    If the nodes N1 and N2 those are connected by E is not in same set
       Put E into Selected Edges
       Union the sets that contains N1 and N2 
After the algorithm completes we have a set of edges in Selected Edges list that forms the minimum spanning three if the graph is connected or otherwise a minimum spanning forest.

CS131: 2-3-4 Tree [Draft]

2-3-4 Tree is a perfectly balanced tree (all leafs are on the same level) with O(log n) worst case time for find, insert and remove operations. As the name suggests every node of the tree has 2,3 or 4 children.




Find a node with key k. If a node X with key k is found proceed to X's left until a leaf node is found. If a 3 key node is found we move the middle key up to its parent. Parent has at most two keys.


Insertion may increase the depth of tree by one level.


Find the node X with key k to remove If X is a leaf, remove it ans return from function. If X is an internal node search for a leaf node Y with smallest key k' such that k" > k and replace Replace X with Y so that X is removed and Y takes its place.

If we find a one key node on the path of a leaf node we want to remove we continuously do the following 3 steps (except the root node):

  1. If X is an one key node that is not root move one key from its adjacent sibling by rotation.


  1. If there is no adjacent sibling with more than one key take one key down from the parent. Internal nodes always have two or three keys. Moving down one key may trigger further operations if the parent is left with one key (verify).


  1. If parent is root and there is no adjacent sibling with more than one key (1 and 2 fails) create a 3 key node to form the root. This is called a fusion operation. The tree level is decreased by one level.


Remove may decrease the tree by one level.

CS131: Splay Tree

Splay tree is a balanced binary search tree. On average all operations take O(log n) time.
Any sequence of k operations starting from an empty tree and never exceeding n items in tree will have O(k log n) worst case time.
The data structure ensures fast access to recently accessed data and much easier to program that other balanced trees like 2-3-4 tree.
Splay tree is not kept perfectly balanced all the time.

Tree rotation

Splay operation

We do splay operation whenever an item is accessed to move an item up through root. Repeatedly doing the operation moves the item up to the root of the tree.
We continuously rotate the tree so that it takes the position of the item above (parent, grand parent, great grand parent etc.) until it becomes the root.
Given an item X to splay up there are three possible situations:

a. X is a left child of a right child or right child of a left child
For left child of a right child we rotate right first and then rotate left to move the item two level up. For right child of a left child we rotate left first and then rotate right to move the item two level up.
b. X is a left child of a left child or right child of a right child
For left child of a left child we rotate right twice to move the item two level up. For right child of a right child we rotate left twice to move the item two level up.
c. X is child of root
We rotate once as appropriate (left or right) to move the item to root.
In all cases one rotation moves the item one level in upper direction.

Splay tree operations

Find operation

    Search for the key k as like a binary search tree or reached a leaf node
    Let X be the last node during the search whether it contains k or not
    Splay the node X up so that it becomes root through a sequence of tree rotation

    Splay X up to root using a sequence of rotation.

Insert Operation

    Insert k like binary search tree
    Splay the new node up to root through tree rotation

Remove operation

    Remove the node as ordinary binary search tree
    Splay the nodes parent up to root using tree rotation

CS131: Binary Search Tree

A binary search tree is a binary tree such that for any node n, if a node x is in n's left subtree, x's key is smaller than n's key nad if x is on the right subtree of n, x's key is bigger than n's key.

Fig: Binary Search Tree

From above figure let us consider the node with key 15. Its left subtree has nodes with keys 10,5 and 12, all of which are less than 15. Its right subtree has keys 20,17, and 22 all of which are bigger than 15. Node with same key can be in any subtree.

Binary search tree search for minimum or maximum key is fast and we can do inexact match like what is the largest key smaller than 57 and what is the smallest key bigger than 57. Or we can find a word with close but not exact spelling. While searching down the tree for a key k, if that key is not present we enclunter the node with smallest key greater that k and the node with largest key smaller than k.

If we do an inorder traversal we traverse the nodes in sorted order of the keys.



Start with root node and go to left child until no longer possible and return the last node encountered.


Start with root node and go to right child until no longer possible and return the last node encountered.

struct node
    int key;
    node *left, *right;

node* search(node root, int k, node* &parent)
    cur_node = root;
    while(cur_node != NULL)
        if(cur_node->key == k)
            return cur_node;

        parent = cur_node;

        if(cur_node->key > k)
            cur_node = cur_node->right;
            cur_node = cur_node->left;
    return NULL;

Find the node n with key k.
If none found return.
If n has no child remove it.
If n has one child replace n with its child.
If n has two children
  - Find node d with smallest key from n's right subtree
  - Remove remove d from its location and place at n's location


Fig: Binary search tree


Node 15 has two children and smallest key on the right subtree is 17. We replace node 15 with node 17 and the tree becomes:

Fig: Binary search tree after removing 15

remove(node root, int k)
    node* parent;
    node* target_node = search(root, k, parent)

    if(node == NULL)

    if(node->right == NULL && node->left == NULL)
        if(parent->left == target_node)
            delete target_node;
            parent->left = NULL;

        if(parent->right == target_node)
            delete target_node;
            parent->right = NULL;


Running time

Running time for operations on a binary search tree depends on the depth of the tree. If the tree is balanced the depth of the tree oi O(log n) where n is the number of nodes in the tree. In that case the complexity of the operations is O(log n).

So, if the tree has O(log n) depth all operations takes O(log n)

But it is possible that the tree is not balanced. If we inser items in a binary search tree in sorted order the binary search tree becomes a linked list which will have a running time of O(n) for all operations. This is worst case running time.

Worst case complexity O(n)