The time necessary for a sort with BubbleSort increases in a quadratic fashion with the number of records. The algorithm works in place and stable. In the best case, when the data records are already sorted, the time increases in a linear fashion with the number of records.

Short Description

The main idea behind BubbleSort is to find the biggest key value and to move that record to the end of the list. This moving is done by comparing the key value of the actual record with the key value of the next record. If the key value of the next record is smaller, the two records are swapped; if the value is bigger, the candidate for having the biggest key value is from then on the next record.

The algorithm works with an outer and an inner loop. After the first repetition of the outer loop the record with the biggest key value is definitely on top of the list. When searching for the second biggest key value, it is not necessary to compare it with the last element, since we already know the result of such a compare; the number of repetitions of the inner loop is decreased by one with every repetition of the outer loop.

Since every record with the actually biggest key value is moved up, it is normally not just one record that moves upwards during a repetition of the outer loop. This is compared with the bubbles in a drink and that's where the algorithm got its name from.

Stability

If there are records with identical key values and they become neighbors during the sort, the upper record becomes the new candidate for having the biggest key value. If the algorithm is realized with a bigger- and not a bigger-or-same- clause, it is stable, because records with identical key value do not change their position relative to each other.

Timing

For an estimate of the runtime we visualize the number of compare operations. In the vertical direction we have the repetitions of the outer loop and in the horizontal direction the number of compare operations.

   1  2  3  4  5  6  7  8  9 10   |   compares
   x  x  x  x  x  x  x  x  x  -   |      9
   x  x  x  x  x  x  x  x  -  -   |      8
   x  x  x  x  x  x  x  -  -  -   |      7
   x  x  x  x  x  x  -  -  -  -   |      6
   x  x  x  x  x  -  -  -  -  -   |      5
   x  x  x  x  -  -  -  -  -  -   |      4
   x  x  x  -  -  -  -  -  -  -   |      3
   x  x  -  -  -  -  -  -  -  -   |      2
   x  -  -  -  -  -  -  -  -  -   |      1
   -  -  -  -  -  -  -  -  -  -   |      0

Half of the n*n matrix is filled except for the diagonal; that makes ½ n2 - n compares. Since we don't need the exact number but the rough behavior, we calculate the number of compare operations with

NC = ½ n2

Worst Case & Normal Case

In order to get the worst case concerning the swap operations, we assume that the list is sorted in reverse order. Then every compare operation is followed by a swap.

NS = ½ n2

Every compare- and swap-operation needs time. We evaluate the time for a complete sort with

TSort = ½ * C * n2 + ½ * S * n2
TSort = n2 * (½ C + ½ * S)
TSort = B * n2

where B, C and S are constants and n the number of records to be sorted.

With an even distribution of the key values the calculation of the necessary number of steps becomes a bit complicated and the correct evaluation can be found here [1]. It increases with ¼ n2. The time for a sort will be in average twice as fast as for the worst case. But the timing still follows the quadratic characteristic, only the actual value of B changes.

Termination: The loop terminates at j=i+1. At that point A[i]  A[i+1], A[i]  A[i+2], …, A[i]  A[n]. So, A[i] is the smallest element between A[i] through A[n]. QED

Theorem: BubbleSort correctly sorts the input array A.

Loop invariant for lines 1 through 4 is that A[i]  A[i+1].

Initialization: Starts with A[1].

Maintenance: After the iteration i=p, for some 1<pn, the above lemma for lines 2 through 4 guarantees that A[p] is the smallest element from A[p] through A[n]. So, A[p]  A[p+1].

Termination: The loop 1 through 4 terminates at i=n. At that point A[1]  A[2]  . . .  A[n]. Hence, the correctness theorem is proved. QED

Complexity:  i=1n  j=i+1n 1 =  i=1n (n –i) = n2 – n(n+1)/2 = n2/2 – n/2 = (n2)

Best Case

Now we come to the best case: the records are already sorted. Without any additions to the algorithm we still have the same number of compare operations, but no swaps. This information about "no swaps" can be used to speed up the sort when the records are sorted or nearly sorted. At the beginning of the outer loop we set a marker to NO_SWAPS_DONE. In the inner loop this value is set to SWAPS_DONE in case any records were swapped. At the end of the outer loop we check the value of this marker. If there were no swaps, the records are sorted and the algorithm can terminate. We have a best case behavior of

TBest = n * C

Nearly Sorted

Now we have a look at a nearly sorted situation by adding a new record at the end of the list; and the key value is the smallest of all records. The number of compare operations will still be

NC = ½ n2

and the number of swaps will be

NS = n

Compared with algorithms like InsertionSort the timewise behavior is pretty bad.

Example

In the list to be sorted are 10 records. A blue background indicates that there has to be a swap of data records. A green background indicates, that the upper record becomes the new candidate for the biggest key value.


First repetition of outer loop

387310529174011227020901522771
310387529174011227020901522771
310387529174011227020901522771
310387174529011227020901522771
310387174011529227020901522771
310387174011227529020901522771
310387174011227020529901522771
310387174011227020529901522771
310387174011227020529522901771
310 387 174 011 227 020 529 522 771 901


Second repetition of outer loop

310387174011227020529522771901
310387174011227020529522771901
310174387011227020529522771901
310174011387227020529522771901
310174011227387020529522771901
310174011227020387529522771901
310174011227020387529522771901
310174011227020387522529771901
310 174 011 227 020 387 522 529 771 901


Third repetition of outer loop

310174011227020387522529771901
174310011227020387522529771901
174011310227020387522529771901
174011227310020387522529771901
174011227020310387522529771901
174011227020310387522529771901
174011227020310387522529771901
174 011 227 020 310 387 522 529 771 901


Fourth repetition of outer loop

174011227020310387522529771901
011174227020310387522529771901
011174227020310387522529771901
011174020227310387522529771901
011174020227310387522529771901
011174020227310387522529771901
011 174 020 227 310 387 522 529 771 901


Fifth repetition of outer loop

011174020227310387522529771901
011174020227310387522529771901
011020174227310387522529771901
011020174227310387522529771901
011020174227310387522529771901
011 020 174 227 310 387 522 529 771 901


Sixth repetition of outer loop

011020174227310387522529771901
011020174227310387522529771901
011020174227310387522529771901
011020174227310387522529771901
011 020 174 227 310 387 522 529 771 901


The algorithm terminates after this loop due to the fact that there were no swaps done.


  1. Knuth, Donald E.: The Art of Computer Programming, Vol. 3: Sorting and Searching, 2nd Edition, Addison-Wesley, 1981
This article is issued from Wikiversity. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.