Palindrome

A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward or forward, such as madam or kayak.

To check if a given string is palindrome or not, take two pointers approach, first pointer at the start and 2nd pointer at the end, start comparing characters at those index while incrementing the first index and decrementing the 2nd till a mismatch is found or the pointers meet each other at the centre. If NO mismatch is found, the string is a palindrome else it is not. Complexity of this algorithm is linear O(n).

public static boolean isPalindrome(final String str) {
    int start = 0;
    int end = str.length() - 1;
    while (start < end) {
        if (str.charAt(start) != str.charAt(end)) {
            return false;
        }
        start++;
        end--;
    }
    return true;
}

palindrome

Now, if we modify the same problem statement to check if a given string or any of its substring is a palindrome.
The hasPalidrome() method will return length of substring which is a palindrome, in worst-case it will return 1 (remember, every character is a palindrome).

public static int hasPalindrome(final String str) {
    for (int i = 0; i < str.length() - 1; i++) {                                  for (int j = str.length() - 1; j > i; j--) {
            if (isPalindrome(str, i, j)) {
                return j - i + 1;
            }
        }
    }
    return 1;
}

Over here, we are using the isPalindrome subroutine to check if a given string from index start to end is a palindrome or not.

substring-palindrome.png

If we update the above problem statement again to return the length of longest substring which is a palindrome, the above algorithm will NOT work since it will return length of the first palindrome that is found and not necessarily the longest.

For ex: If we modify our original string madam and prefix it with ‘a’, it becomes amadam.

The above algorithm will return length 3 for ‘amadam’ which is a valid palindrome however not the longest, which is still ‘amadam‘ of length 5.

The reason above algorithm does not work for longest palindrome is because we terminate the moment we find the first palindrome. We can modify the above to maintain a max length variable and keep updating it as and when we find a new palindrome of longer length, this is nothing but a brute force approach where we are evaluating all substrings.

A slightly better approach is to first evaluate longer substrings, since the moment we find a substring which is a palindrome, its length will always be greater than any palindrome that can be found within that substring itself and hence we do not have to evaluate any substrings within that substring further.

A naive recursive approach is to find if the given string is a palindrome, if not, find maximum length palindrome in left and right substring.

public static int maxPalindromeLength(final String str, int start, int end){
    //Condition to break recursion
    if (start >= end) {
        return 1;
    }

    if (isPalindrome(str, start, end)) {
        return end - start + 1; 
    }

    //return longest substring of left and right substring.
    return Math.max(
        //longest palindrome in left substring.
        maxPalindromeLength(str, start, end - 1),
        //longest palindrome in right substring.  
        maxPalindromeLength(str, start + 1, end)
    );
}

recursive-palindrome.png

The complexity of this algorithm is exponential O(2^n), since with every increase in n, the total number of comparisons increases exponentially.

The below recursion tree shows total number of comparisons done for n=2, 3 and 4.

recursionTree-palindrome.png

Over here we are comparing the same substring “bc” twice, hence this problem has overlapping subproblems property of dynamic programming approach.

A bottom up approach is to first evaluate smaller substrings and use those results during further evaluation. In other words, a string is a palindrome provided the first and last character matches and substring between those characters is also a palindrome. Hence a string S(0,n) is a palindrome provided character at 0 and n matches and S(0+1, n-1) is also a palindrome. So we can use result of S(0+1, n-1) while evaluating S(0, n), hence this problem has optimal substructure property of dynamic programming approach as well.

Re-computation of overlapping subproblems can be avoided by using a lookup table L[][].

L[i][j] is a palindrome if character at i and j matches and L[i+1][j-1] is also a palindrome.

if (character at i and j matches && L[i+1][j-1] is a palindrome) {
    L[i][j] = L[i+1][j-1] + 2; // length of inner palindrome + 2
}
else {
    L[i][j] = -1; // not a palindrome
}

We start analyzing all substrings of length 2 and use those results while analyzing all substrings of length 3, and so on and so forth.

public static int longestPalindrome(final String str) {
    int length = str.length();
    int [][] L = new int[length][length];

    for (int i = 0; i < length; i ++) {
        for (int j = 0; j < length; j++) {
            //Every character is a palindrome of length 1.
            L[i][j] = (i == j) ? 1 : -1;
        }
    }

    int subStrLength = 2;
    while (subStrLength <= length) {
        for (int i = 0; i < length - subStrLength + 1; i++) {
            int j = i + subStrLength - 1; //end index of substring.
            if ((subStrLength == 2 || L[i+1][j-1] != -1) && 
                str.charAt(i) == str.charAt(j)) {
                L[i][j] = L[i+1][j-1] + 2;
            }
            else {
                L[i][j] = -1;
            }
        }
        subStrLength++;
    }
    return max(L);
}

The complexity of above algorithm is O(n^2).

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/dp/Palindrome.java

References:
https://en.wikipedia.org/wiki/Palindrome

Advertisements

Knapsack Problem

The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

The brute force approach is to form all possible combinations of items and consider only those subsets whose total weight is equal to or less than the total knapsack weight W, out of the chosen subsets pick the subset whose value is maximum.

Given two integer array of wt: [1, 2, 3] and val: [20, 5, 30] representing weight and value, below are the possible combination of items along with their weight and value:

{1} -> wt: 1, value: 20
{2} -> wt: 2, value: 5
{3} -> wt: 3, value: 30
{1, 2} -> wt: 1 + 2 = 3, value: 20 + 5 = 25
{1, 3} -> wt: 1 + 3 = 4, value: 20 + 30 = 50
{2, 3} -> wt: 2 + 3 = 5, value: 5 + 20 = 25
{1, 2, 3} -> wt: 1 + 2 + 3 = 6, value: 20 + 5 + 30 = 55

Now depending on the total Knapsack Capacity, we can pick appropriate subsets.
For a knapsack capacity of 1, only first subset {1} can be chosen.
However, for a knapsack capacity of 4, there are 5 subsets whose total weight is less than or equal to 4: {1}, {2}, {3}, {1, 2}, {1, 3}, out which {1, 3} has maximum value of 50, hence we will pick that subset.
Brute force algorithm will involve all sets, and the total number of combinations of all sets is equal to 2^numOfItems.

To find the optimal substructure which has the maximum value, we have to consider each item and for every item we have only two possible cases.
case 1: Item is considered.
case 2: Item is NOT considered.

We have to pick that case which will result in higher value:
case 1: If item is consider, the total value will increase by that item’s value and knapsack capacity will reduce by that item’s capacity.
case 2: If item is NOT consider, value and knapsack capacity will NOT change.

After considering that item, we move on to next item. The recursive solution to above approach is:

public static int knapsack(int W, int [] wt, int [] val, int numOfWts) {
	if (numOfWts == 0 || W <= 0) {             
            return 0;
        }
        if (wt[numOfWts - 1] > W) {
            return knapsack(W, wt, val, numOfWts - 1);
	}
	return max(
	    (val[numOfWts-1]+knapsack(W-wt[numOfWts-1],wt,val,numOfWts-1)),
	    knapsack(W, wt, val, numOfWts - 1)
	);
}

If we draw a recursion tree for the above recursive approach for a knapsack problem K(n,W) where n is number of items and W if knapsack capacity.

wt: {1, 2, 3} val: {1, 2, 3}, W=2
                         K(3, 2) 
                     /            \ 
                   /                \               
             K(2,2)                   K(2,1)
            /      \                  /    \ 
          /          \              /        \
       K(1,2)      K(1,1)        K(1,1)     K(1,0)
       /  \         /   \          /   \
     /      \     /       \      /       \
K(0,2)  K(0,1)  K(0,1)  K(0,0)  K(0,1)   K(0,0)

We are evaluating K(1, 1) twice, this shows that the sub-problems are overlapping. Since knapsack has both optimal substructure and overlapping subproblem this can be solved using DP approach of using a memoization/lookup table.

The lookup table K[n][W] can be constructed bottom UP where n is number of items and W is total knapsack capacity.

To build the table, we start with 1st item and compute the maximum value with increasing knapsack capacity. We introduce a new item and recalculate maximum values again.

public static int knapsackDP(int W, int[] wt, int[] val) {
    int[][] K = new int[wt.length + 1][W + 1];

    for (int i = 0; i <= wt.length; i++) {
        for (int w = 0; w <= W; w++) {
            //base case, either 0 items or no knapsack capacity.
            if (i == 0 || w == 0) {
                K[i][w] = 0;
            }
            //this item can be considered.
            else if (wt[i - 1] <= w) {
                //max of:
                K[i][w]=max(
//case1:value of that item + remaining items with reduced knapsack capacity
                    val[i - 1] + K[i - 1][w - wt[i - 1]], 
//case2: without this item, value and capacity remains same.
                    K[i - 1][w]
                );
            } 
            //weight of that item is higher current knapsack capacity.
            else {
                K[i][w] = K[i - 1][w];
            }
        }
    }
    return K[wt.length][W];
}

Knapsack lookup table for wt: {1, 1, 1, 1 , 1} and value: {1, 1, 1, 1, 1} and W=5 is:
Knapsack table:
[0 0 0 0 0 0 ]
[0 1 1 1 1 1 ]
[0 1 2 2 2 2 ]
[0 1 2 3 3 3 ]
[0 1 2 3 4 4 ]
[0 1 2 3 4 5 ]

Where row is item and column is knapsack capacity. When item is 0 (row = 0) or knapsack capacity is 0 (column = 0), the total knapsack maximum value is 0.

Hence 1st row and 1st column values are all 0s.
2nd row: there is only 1 item to be consider hence increasing knapsack capacity is not increasing total value and the value is constant at 1.
3rd row: now we are considering 2 items each of weight 1, hence from column 3 when total knapsack capacity is 2, both items can be put in knapsack and total value of knapsack increases from 1 to 2, but it stays at 2 since both items are consider, hence 2nd row is: [0 1 2 2 2 2]
Finally for the last 5th row: when all items are available, with increasing knapsack capacity (column), each additional item can be added to be knapsack thereby increasing its value: [0 1 2 3 4 5]

The above DP approach has a time complexity of O(nW)

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/dp/Knapsack.java

Reference:
https://en.wikipedia.org/wiki/Knapsack_problem

Analysis of Algorithms

The theoretical study of computer-program performance and resource usage.

The objective is make programs fast and the factors that influence them including resources (RAM, SSDs).

But before analysing the factors, we need to figure out what is more important than performance?

  1. User friendliness
  2. Security
  3. Correctness
  4. Simplicity
  5. Maintainability
  6. Scalability
  7. Precision
  8. Robustness
  9. Features
  10. Modularity

Then, why study algorithms and performance ?

  1. Performance is co-related with user friendliness, nothing is more frustrating than a slow loading web-page.
  2. Real time constraints, sometime they do not work if they do not function adequately.
  3. Divides the line between feasible and non-feasible.
  4. Algorithms give language for program behaviour.

Performance is like a currency to buy other things.

If we want to analyse running time of an algorithm like Insertion Sort, it depends on a lot of things.

  • Depends on input (Eg. for an already sorted array, a sort like insertion sort has very less work to do)
  • Depends on the input size (6 elements vs 6*10^9 elements)
    • Parameterise the input size, time as a function of size to see the behaviour with increasing size.
  • Want upper bounds to guarantee the user the maximum amount of time an algorithm is going to run.

Kinds of analysis:

There are usually three kinds of analysis we can do for an algorithm.

Worst-case (usually):

T(n) = Maximum time on any input size of n

Average-case (sometimes):

T(n) = Expected time over all input of size n (weighted average)

To calculate average we need to know the probability of input, since we do not know that we need assumption of statistical distribution (Every input of size n is equally likely)

Best-case (never):

This is no good since we can cheat the analysis by taking a poor algorithm which performs better only for a specific input which is the best case but does not tell anything about other vast majority of inputs. A slow algorithm that works fast on some input.

So what is the worst-case of an algorithm?

Depends on computer, whether it is a super computer or a wrist watch.

We also check for relative speed, compare two algorithms on same machine to check the performance.

We can also check for absolute speed, one algorithm always perform better than the other no matter what machine we run on.

This gets confusing about how we can talk about the worst-case of an algorithm by only discussing the software and not discussing hardware since software’s performance is dependent on hardware. An algorithm on faster machine will always run faster than compared to slow machine.

BIG IDEA ! Asymptotic Notation Analysis

Asymptotic analysis helps us analyse algorithms and make things simpler.

  1. Ignore machine dependent constants.
  2. Look at growth of the running time instead of running time itself. T(n) as n -> ∞

Asymptotic notation:

Θ – notation (Theta notation) :

  • Drop low order terms
  • Ignore leading constants

Ex: 3n³ + 90n² – 5n + 6046 = Θ(n³)

As n -> ∞, Θ(n²) algorithm always beats a Θ(n³) algorithm. Even if we run Θ(n²) algorithm on a slower computer and Θ(n³) on fast computer.

So asymptotic notation helps us to resolve the issue of both relative and absolute speed.

On different machines, we may get different machine constants, but the growth as the size of the input gets larger, the asymptotic wont change.

Screen Shot 2016-08-12 at 10.06.11 pm

References

  1. Introduction to Algorithms
  2. Introduction to Algorithms video lecture – Charles E. Leiserson

Insertion Sort

Insertion sort is a comparison sort algorithm with worst case complexity of O(n^2).

Insertion can be compared with the way cards are sorted manually in hand.

We first pick a card in hand. The 2nd card is picked and compared with first, if it is smaller it is placed ahead of it. The 3rd card is picked and compared with remaining cards in hand to find its final position.

At a time all the cards in hand are sorted and a new card is picked and “inserted” in its final position.

for (int j = 1; j < arr.length; j++) { 
    int key = arr[j]; 
    int i = j-1; 
    while (i >= 0 && arr[i] > key) {
        arr[i+1] = arr[i];
        i--;
    }
    arr[i+1] = key;
}

Complexity:

Worst case is when input is reverse sorted

T(n) = Total work done in each iteration.
In first iteration there is only one comparison
In second iteration there are two comparisons
and so on and so forth.
T(n) = 1 + 2 + 3 + … n = n^2 (arithmetic series)

Best case is already sorted array where complexity is O(n)

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/sorting/InsertionSort.java

Bubble Sort

Bubble sort is a comparison sort algorithm where each pair of adjacent items are compared and swapping them if they are in wrong order to sort the array.

At the end of each iteration the largest element of that iteration will reach its final position.

For ex in array: [9, 3, 7, 8, 1]

At the end of 1st iteration, the 1st largest element 9 will reach its final position which is end of the array : [x, x, x, x, 9]
At the end of 2nd iteration, the 2nd largest element 8 will reach its final position which is 2nd last element in the array : [x, x, x, 8, 9]
So on and so forth, at the end of all iterations the array is going to be sorted.

Algorithm logic:

Total number of iterations = Total number of elements in the array, since at the end of one iteration, we will find largest element of that iteration only.

In each iteration:

  1. Start from index 0, compare adjacent elements, if left is greater than right, swap.
  2. Move right, compare again.

Lets sort array: [9, 3, 7, 8, 1]

Iteration 1:
[9, 3, 7, 8, 1]: 9 is greater than 3, swap elements:  [3, 9, 7, 8, 1]
[3, 9, 7, 8, 1]: 9 is greater than 7, swap elements:  [3, 7, 9, 8, 1]
[3, 7, 9, 8, 1]: 9 is greater than 8, swap elements:  [3, 7, 8, 9, 1]
[3, 7, 8, 9, 1]: 9 is greater than 1, swap elements: [3, 7, 8, 1, 9]
At the end of iteration 1, 9 has reached at the end of array.
Array at end of iteration: [3, 7, 8, 1, 9]

Iteration 2:
[3, 7, 8, 1, 9]
[3, 7, 8, 1, 9] ->  [3, 7, 1, 8, 9]
[3, 7, 1, 8, 9]
[3, 7, 1, 8, 9]
Array at end of 2nd iteration: [3, 7, 1, 8, 9]

Iteration 3:
[3, 7, 1, 8, 9]
[3, 7, 1, 8, 9] ->  [3, 1, 7, 8, 9]
[3, 1, 7, 8, 9]
[3, 1, 7, 8, 9]
Array at end of 3rd iteration: [3, 1, 7, 8, 9]

Iteration 4:
[3, 1, 7, 8, 9] -> [1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
Array at end of 4th iteration: [1, 3, 7, 8, 9]

Iteration 5:
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
Array at end of 4th iteration: [1, 3, 7, 8, 9]

Complexity:

Bubble sort algorithm is a comparison sort algorithm since we compare two elements to take a decision, any comparison sort algorithm cannot have complexity better than nlogn.

Lets find out complexity of bubble sort.
Complexity = (Total number of iterations) * (Work done in each iteration)

Total number of iterations = Total number of elements in an array = n.

Work done in each iteration = We start from index 0, compare adjacent elements and move right till the end of array, so in each iteration we do n-1 comparisons.

Complexity = n * (n-1) = n^2 – n = n^2 (ignore lower terms while computing complexity)

Worst-case T(n) = O(n^2)
Best-case T(n) = O(n)
Avg case T(n) = O(n^2)

Applications:

Bubble sort can be used to find ith largest element in the array. For ex, if you want to find 3rd largest element of the array, at the end of 3rd iteration the 3rd largest element will reach its final position which is 3rd last element from last.

    for (int i = 0; i < arr.length; i++) {
        for (int j = 0; j < arr.length - 1; j++) {
            if (arr[j] > arr[j+1]) {
                swap(arr, j, j+1)
            }
        }
    }

Bubble sort can be improved further by reducing the inner index since the largest element has already reached at the end.

Also, a swapped variable which is set to false at the start of the iteration and set to true only if elements are swapped. At the end of the iteration if swapped is still false, it means the array is sorted and does not require further comparisons. Hence, the best case for already sorted array is O(n).

Bubble sort does not require extra space to do the sorting, just one variable to do the swap which can also be avoided.

Bubble requires twice as many writes as insertion sort, 5 times slower than insertion sort and 40 times slower than selection sort.

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/sorting/BubbleSort.java