Heap Sort : Algorithm and Pseudo Code

Heap sort is a comparison-based sorting algorithm to create a sorted array (or list), and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quick sort, it has the advantage of a more favorable worst-case O(n log n) run-time. Heap sort is an in-place algorithm, but is not a stable sort.

For any array A to be a heap, the following heap property should be satisfied :  A[Parent(i)] >= A[i], for any element present at index i in the array A.

The height of any heap consisting of n elements is given as h = O(log n)

Heap sort begins by building a heap out of the data set, and then removing the largest item and placing it at the end of the partially sorted array. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the partially sorted array. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.

Heap sort inserts the input list elements into a binary heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.


During extraction, the only space required is that needed to store the heap. To achieve constant space overhead, the heap is stored in the part of the input array not yet sorted.


Heap sort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.



Because of the O(n log n) upper bound on heap sort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heap sort. Heap sort also competes with merge sort, which has the same time bounds, but requires Ω(n) auxiliary space, whereas heap sort requires only a constant amount. Heap sort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heap sort:

1) Like quick sort, merge sort on arrays has considerably better data cache performance, often outperforming heap sort on a modern desktop PC, because it accesses the elements in order.

2) Merge sort is a stable sort.

3) Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heap sort at all.

4) Merge sort can be easily adapted to operate on linked lists (with O(1) extra space) and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times. (Note: Heapsort can also be applied to doubly linked lists with only O(1) extra space overhead)

Assumptions Made : 

  • Parent(i) = floor(i/2);
  • Left(i) = 2i;
  • Right(i) = 2i +1;

Pseudo Code :


Heapify(A, i){
    l = Left(i);
    r = Right(i);
    if( l <= heap-size[A] && A[l] > A[i]){
        largest = l;
    } else {
        largest = i;
    }
    if(r <= heap-size[A] && A[r] > A[largest]){
        largest = r;
    }
    if(largest != i){
        exchange (A[i], A[largest]);
        Heapify(A, largest);
    }
}

BuildHeap(A){
    heap-size[A] = length[A];
    for i = floor(length[A]/2) to 1{
        Heapify(A, i);
    }
}

HeapSort(A){
    BuildHeap(A);
    for i = length[A] to 2{
        exchange(A[1], A[i]);
        heap-size[A]--;
        Heapify(A, 1);
    }
}

Run-time for Heapify : O(log n) = O(h); T(n) = T(2n/3) + O(1);
Run-time for BuildHeap : O(n)
Run-time for HeapSort : O(n log n)

Worst case performance : O(n log  n)
Best case performance : Ω(n), O(n log  n)
Average case performance : O(n log  n)
Worst case space complexity : O(n) total, O(1) auxiliary

Comments

Popular Posts