Analysis of Algorithms

The theoretical study of computer-program performance and resource usage.

The objective is make programs fast and the factors that influence them including resources (RAM, SSDs).

But before analysing the factors, we need to figure out what is more important than performance?

  1. User friendliness
  2. Security
  3. Correctness
  4. Simplicity
  5. Maintainability
  6. Scalability
  7. Precision
  8. Robustness
  9. Features
  10. Modularity

Then, why study algorithms and performance ?

  1. Performance is co-related with user friendliness, nothing is more frustrating than a slow loading web-page.
  2. Real time constraints, sometime they do not work if they do not function adequately.
  3. Divides the line between feasible and non-feasible.
  4. Algorithms give language for program behaviour.

Performance is like a currency to buy other things.

If we want to analyse running time of an algorithm like Insertion Sort, it depends on a lot of things.

  • Depends on input (Eg. for an already sorted array, a sort like insertion sort has very less work to do)
  • Depends on the input size (6 elements vs 6*10^9 elements)
    • Parameterise the input size, time as a function of size to see the behaviour with increasing size.
  • Want upper bounds to guarantee the user the maximum amount of time an algorithm is going to run.

Kinds of analysis:

There are usually three kinds of analysis we can do for an algorithm.

Worst-case (usually):

T(n) = Maximum time on any input size of n

Average-case (sometimes):

T(n) = Expected time over all input of size n (weighted average)

To calculate average we need to know the probability of input, since we do not know that we need assumption of statistical distribution (Every input of size n is equally likely)

Best-case (never):

This is no good since we can cheat the analysis by taking a poor algorithm which performs better only for a specific input which is the best case but does not tell anything about other vast majority of inputs. A slow algorithm that works fast on some input.

So what is the worst-case of an algorithm?

Depends on computer, whether it is a super computer or a wrist watch.

We also check for relative speed, compare two algorithms on same machine to check the performance.

We can also check for absolute speed, one algorithm always perform better than the other no matter what machine we run on.

This gets confusing about how we can talk about the worst-case of an algorithm by only discussing the software and not discussing hardware since software’s performance is dependent on hardware. An algorithm on faster machine will always run faster than compared to slow machine.

BIG IDEA ! Asymptotic Notation Analysis

Asymptotic analysis helps us analyse algorithms and make things simpler.

  1. Ignore machine dependent constants.
  2. Look at growth of the running time instead of running time itself. T(n) as n -> ∞

Asymptotic notation:

Θ – notation (Theta notation) :

  • Drop low order terms
  • Ignore leading constants

Ex: 3n³ + 90n² – 5n + 6046 = Θ(n³)

As n -> ∞, Θ(n²) algorithm always beats a Θ(n³) algorithm. Even if we run Θ(n²) algorithm on a slower computer and Θ(n³) on fast computer.

So asymptotic notation helps us to resolve the issue of both relative and absolute speed.

On different machines, we may get different machine constants, but the growth as the size of the input gets larger, the asymptotic wont change.

Screen Shot 2016-08-12 at 10.06.11 pm

References

  1. Introduction to Algorithms
  2. Introduction to Algorithms video lecture – Charles E. Leiserson

Insertion Sort

Insertion sort is a comparison sort algorithm with worst case complexity of O(n^2).

Insertion can be compared with the way cards are sorted manually in hand.

We first pick a card in hand. The 2nd card is picked and compared with first, if it is smaller it is placed ahead of it. The 3rd card is picked and compared with remaining cards in hand to find its final position.

At a time all the cards in hand are sorted and a new card is picked and “inserted” in its final position.

for (int j = 1; j < arr.length; j++) { 
    int key = arr[j]; 
    int i = j-1; 
    while (i >= 0 && arr[i] > key) {
        arr[i+1] = arr[i];
        i--;
    }
    arr[i+1] = key;
}

Complexity:

Worst case is when input is reverse sorted

T(n) = Total work done in each iteration.
In first iteration there is only one comparison
In second iteration there are two comparisons
and so on and so forth.
T(n) = 1 + 2 + 3 + … n = n^2 (arithmetic series)

Best case is already sorted array where complexity is O(n)

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/sorting/InsertionSort.java

Bubble Sort

Bubble sort is a comparison sort algorithm where each pair of adjacent items are compared and swapping them if they are in wrong order to sort the array.

At the end of each iteration the largest element of that iteration will reach its final position.

For ex in array: [9, 3, 7, 8, 1]

At the end of 1st iteration, the 1st largest element 9 will reach its final position which is end of the array : [x, x, x, x, 9]
At the end of 2nd iteration, the 2nd largest element 8 will reach its final position which is 2nd last element in the array : [x, x, x, 8, 9]
So on and so forth, at the end of all iterations the array is going to be sorted.

Algorithm logic:

Total number of iterations = Total number of elements in the array, since at the end of one iteration, we will find largest element of that iteration only.

In each iteration:

  1. Start from index 0, compare adjacent elements, if left is greater than right, swap.
  2. Move right, compare again.

Lets sort array: [9, 3, 7, 8, 1]

Iteration 1:
[9, 3, 7, 8, 1]: 9 is greater than 3, swap elements:  [3, 9, 7, 8, 1]
[3, 9, 7, 8, 1]: 9 is greater than 7, swap elements:  [3, 7, 9, 8, 1]
[3, 7, 9, 8, 1]: 9 is greater than 8, swap elements:  [3, 7, 8, 9, 1]
[3, 7, 8, 9, 1]: 9 is greater than 1, swap elements: [3, 7, 8, 1, 9]
At the end of iteration 1, 9 has reached at the end of array.
Array at end of iteration: [3, 7, 8, 1, 9]

Iteration 2:
[3, 7, 8, 1, 9]
[3, 7, 8, 1, 9] ->  [3, 7, 1, 8, 9]
[3, 7, 1, 8, 9]
[3, 7, 1, 8, 9]
Array at end of 2nd iteration: [3, 7, 1, 8, 9]

Iteration 3:
[3, 7, 1, 8, 9]
[3, 7, 1, 8, 9] ->  [3, 1, 7, 8, 9]
[3, 1, 7, 8, 9]
[3, 1, 7, 8, 9]
Array at end of 3rd iteration: [3, 1, 7, 8, 9]

Iteration 4:
[3, 1, 7, 8, 9] -> [1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
Array at end of 4th iteration: [1, 3, 7, 8, 9]

Iteration 5:
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
[1, 3, 7, 8, 9]
Array at end of 4th iteration: [1, 3, 7, 8, 9]

Complexity:

Bubble sort algorithm is a comparison sort algorithm since we compare two elements to take a decision, any comparison sort algorithm cannot have complexity better than nlogn.

Lets find out complexity of bubble sort.
Complexity = (Total number of iterations) * (Work done in each iteration)

Total number of iterations = Total number of elements in an array = n.

Work done in each iteration = We start from index 0, compare adjacent elements and move right till the end of array, so in each iteration we do n-1 comparisons.

Complexity = n * (n-1) = n^2 – n = n^2 (ignore lower terms while computing complexity)

Worst-case T(n) = O(n^2)
Best-case T(n) = O(n)
Avg case T(n) = O(n^2)

Applications:

Bubble sort can be used to find ith largest element in the array. For ex, if you want to find 3rd largest element of the array, at the end of 3rd iteration the 3rd largest element will reach its final position which is 3rd last element from last.

    for (int i = 0; i < arr.length; i++) {
        for (int j = 0; j < arr.length - 1; j++) {
            if (arr[j] > arr[j+1]) {
                swap(arr, j, j+1)
            }
        }
    }

Bubble sort can be improved further by reducing the inner index since the largest element has already reached at the end.

Also, a swapped variable which is set to false at the start of the iteration and set to true only if elements are swapped. At the end of the iteration if swapped is still false, it means the array is sorted and does not require further comparisons. Hence, the best case for already sorted array is O(n).

Bubble sort does not require extra space to do the sorting, just one variable to do the swap which can also be avoided.

Bubble requires twice as many writes as insertion sort, 5 times slower than insertion sort and 40 times slower than selection sort.

Git:
algorithms/src/main/java/org/nmn/notes/algorithms/sorting/BubbleSort.java