OJ's rants

It's not about you, it's about the software

Sorting Algorithms: The Bubble Sort

| Comments

BubblesThis is the first of many posts covering the fascinating topic of sorting.

I chose the Bubble Sort algorithm as the first to cover because of its simplicity. This algorithm tends to be the first sorting algorithm that is taught to students, and hence is a rather apt starting point.

Let’s break it down.

Common Terms

  • Data set - the array or list of items that is to be sorted.
  • n - the number of elements in the data set

The Algorithm

The algorithm consists of a repeated iteration over the elements of the data set. In each iteration, adjacent elements are compared. If those two adjacent items are in the wrong order, they are swapped. The result of each iteration is that the next highest value is placed in the appropriate location in the data set. After repeating the iteration n - 1 times, the entire data set will be sorted.

The Bubble Sort fits in the category of Exchange Sorts due to the manner in which elements are moved inside the data set during the sorting process.

The Example

We start with an unsorted data set of 10 elements which we want to sort in ascending order:

33 98 74 13 55 20 77 45 64 83

The start of the first iteration looks at the first two elements (marked in red):

33 98 74 13 55 20 77 45 64 83

As per the algorithm we compare the two values, swapping them if the first item is bigger than the second. In this case, 98 is bigger than 33, so no swap is required.

Next we look at the two adjacent items, starting at the second item in the list:

33 98 74 13 55 20 77 45 64 83

In this case 98 is greater than 74, so the items must be swapped (marked in blue):

33 74 98 13 55 20 77 45 64 83

We do the same again, this time starting at the third item:

33 74 98 13 55 20 77 45 64 83

Again, we need to swap the items since they’re not in order:

33 74 13 98 55 20 77 45 64 83

We repeat this process until we get to the end of the list (marked in green):

33 74 13 55 20 77 45 64 83 98

As we can see, the result is that the largest number is placed at the end of the list. This is the end of the first iteration.

We then iterate again but only cover the section of numbers that haven’t already been sorted.

33 74 13 55 20 77 45 64 83 98

No swap required, move to the next pair.

33 74 13 55 20 77 45 64 83 98

Swap required:

33 13 74 55 20 77 45 64 83 98

Move to the next pair:

33 13 74 55 20 77 45 64 83 98

Again, swap required:

33 13 55 74 20 77 45 64 83 98

We continue this trend for another iteration before hitting the following case:

33 13 55 20 74 77 45 64 83 98

Here, there is no swap required, so we leave behind the number we’ve been “carrying” (74) and pick up number 77.

33 13 55 20 74 77 45 64 83 98

The result of the iteration is as follows:

33 13 55 20 74 45 64 77 83 98

As we can see, this “bubbles” each of the numbers to the top of the list in the order of highest to lowest. Here is how the rest of the numbers are sorted at the end of each following iteration:

13 33 20 55 45 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98

Note that the last iteration results in two numbers being locked in due to the fact we no longer have any numbers to sort.

The Implementation

Here’s a commented sample in C# which is easily translatable to many other languages:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/// <summary>
/// This is the basic bubble sort algorithm, hard-coded to work
/// with integer values.
/// </summary>
/// <param name="dataSet">Array of integers to sort.</param>
static void BubbleSortBasic(int[] dataSet)
{
    // loop n-1 times.
    for (int i = dataSet.Length - 1; i > 0 ; --i)
    {
        // for each loop, iterate through the first i
        // items (ie. the unsorted ones)
        for (int j = 0; j < i; ++j)
        {
            // if adjacent items need to be swapped
            if (dataSet[j] > dataSet[j + 1])
            {
                // swap them
                Swap(dataSet, j, j + 1);
            }
        }
    }
}

A general form of this algorithm which applies to any comparable type can be found in the source repository listed below.

A Minor Optimsation

The Bubble Sort can be optimised very slightly, though it’s not guaranteed to provide much benefit depending on the structure of the data set that is to be sorted.

If, for any iteration, there are no items swapped then all of the items in the data set must be in the correct order. As a result, subsequent iterations are unnecessary. The optimised version is listed below, again in C#:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/// <summary>
/// This is the basic bubble sort algorithm, hard-coded to work
/// with integer values but with a slight difference - a minor
/// optimisation.
/// </summary>
/// <param name="dataSet">Array of integers to sort.</param>
static void BubbleSortBasicOptimised(int[] dataSet)
{
    // loop n-1 times.
    for (int i = dataSet.Length - 1; i > 0 ; --i)
    {
        // keep track of whether items were swapped
        // for this iteration
        bool swapped = false;

        // for each loop, iterate through the first i
        // items (ie. the unsorted ones)
        for (int j = 0; j < i; ++j)
        {
            // if adjacent items need to be swapped
            if (dataSet[j] > dataSet[j + 1])
            {
                // swap them
                Swap(dataSet, j, j + 1);

                // indicate that we found a swap
                swapped = true;
            }
        }

        // if nothing was swapped, then we should
        // already have everything in order
        if (!swapped)
        {
            break;
        }
    }
}

Caveat: This “optimisation” may actually result in lower performance, particularly in the case where every iteration results in a swap.

The Complexity

Space Complexity

Bubble Sorts are extremely efficient in terms of memory usage, due to the fact that all sorting and swapping operations are done on the original data set. Regardless of the size of the original data set, the amount of memory overhead is constant as we don’t allocate any memory for each of the items. Hence, the space complexity is O(1) (in Big-O notation). This, of course, doesn’t include the O(n) space complexity taken up by the original data set.

Time Complexity

This is where the Bubble Sort fails to shine. For each iteration i starting at n, we must loop through the previous i - 1 elements and swap if required.

The first iteration loops through n - 1 elements. The second iteration loops through n - 2 elements. The third iteration loops through n - 3 elements. And so on.. which means the number of compares/swaps that we do is equal to: (n - 1) + (n - 2) + (n - 3) … + 2 + 1 This indicates that there n lots of n - i, or n2 - ni (where ni varies for each iteration).

Give that the highest power of n is 2 this indicates a time complexity of O(n2).

Or in layman’s terms: it’s f**king slow :)

Bubble Sorts should only be used on data sets that are rather small. If you’re dealing with medium-sized or larger sets of data, then the Bubble Sort is not the right algorithm to choose. So what would be a better option? You’ll have to read the rest of this series and answer that yourself ;)

Other Implementations

I’m slowly gathering a collection of implementations of all the sorting algorithms, including the ones listed above, that I’m covering in this series and posting them up on BitBucket. The Bubble Sort implementations can be found here.

Note: You’ll need Mercurial if you want to pull directly from the repository, otherwise you’ll have to copy and paste from the web.

Closing Thoughts

I haven’t included information on multithreading because the post would be insanely big. I will put up a follow-up post covering multithreading another day. The code stored in the BitBucket repository contains a sample of how you might use multithreading in conjunction with this sorting algorithm.

To wrap up, Bubble Sorts are easy to understand and are a great place to start when learning sorting algorithms. Unfortunately, this simplicity results in a fairly expensive and slow sorting implementation which really isn’t an option when dealing with anything other than small data sets.

References and Other Information

  1. Wikipedia - Bubble sort

Next up, we’ll be looking at the Cocktail Sort.

Disclaimer

I’m no expert, nor an authority. The post above is based on my experience and a bit of research. If you find something wrong with anything I’ve said please let me know. Comments and feedback are greatly appreciated.

Comments