OJ's rants

It's not about you, it's about the software

Sorting Algorithms: The Comb Sort

| Comments

Comb SortWelcome to this, the third post in the series on sorting algorithms. Next up, we’re going to cover the Comb Sort.

Like the Cocktail Sort, the Comb Sort isn’t particularly well known. Most people manage to make their way through tertiary studies without ever hearing of it. This post is designed to change that!

The Comb Sort is another comparison and exchange sort which builds on the idea of the Bubble Sort and adds a potential optimisation or two.

Make sure you read the articles on the Bubble Sort and the Cocktail Sort before you read this article. Doing so will make it much easier to understand.

Right, enough with the introductions, let’s break it down!

Common Terms

  • Data set - the array or list of items that is to be sorted.
  • n - the number of elements in the data set.
  • Turtle - a name given to small numbers that appear towards the end of a data set. When used in a Bubble Sort, small numbers at the end of the set take a very long time to get to the front of the list, and hence are called turtles because of the lack of speed.
  • Hare - a name given to large numbers that appear towards the start of a data set. When used in a Bubble Sort, large numbers move to the end of the set very quickly, and hence are called hares due to their high speed.
  • Killing the turtles - a phrase that implies quickly moving turtles to the front of the data set.
  • Gap - the size of the gap between areas of the data set that are being sorted during a given iteration of the sort.
  • Shrink Factor - a factor which is applied to the gap prior to each iteration to reduce its size.

A Bit of Background Information

The Comb Sort, along with the Cocktail Sort, was developed as an improvement to the Bubble Sort using the idea of killing the turtles. Turtles drastically reduce the efficiency of the Bubble Sort and hence are an obvious place to attempt optimisation.

The Bubble Sort algorithm always compares values that are adjacent to each other in the data set. The Comb Sort improves on this by adding a gap which allows non-adjacent numbers to be compared. After each iteration, the gap is reduced by a shrink factor until it reaches the value of 1. Hence, at the very end, the Comb Sort behaves exactly like the Bubble Sort.

One interesting thing to note about this algorithm is that it’s almost as fast as a Quick Sort!

The Algorithm

Each iteration of the algorithm consists of three stages:

  1. Calculation of the gap value.
  2. Iterating over the data set comparing each item with the item that is “gap” elements further down the list and swapping them if required.
  3. Checking to see if the gap value has reached one and no swaps have occurred. If so, then the set has been sorted.

1. Calculating the “gap”

The gap, which is essentially an offset used when comparing elements, is something that changes for each iteration. The value needs to change in a uniform manner in such a way as to provide optimum comparisons during the sorting phases.

Stephen Lacey and Richard Box, who popularised the algorithm, suggested that in each iteration the gap should be reduced by a factor of approximately 1.3 (see the wikipedia article for more information). Hence, calculation of the gap is as simple as starting with the size of the data set and dividing by a shrink factor of 1.3 each iteration.

2. Iterating and Swapping

This phase is the same as the Bubble Sort. The only difference in the Comb Sort is that the items which are compared are i and i + gap as opposed to i and i + 1.

For each iteration, we start at the beginning of the data set and loop until i + gap is greater-than or equal to the size of the data . For each sub-iteration in this loop, we compare i and i + gap and swap the values if required.

At this point the gap is recalculated and we go again.

3. Terminating the Loop

When the gap value reaches one, we know that we’re now behaving the same as the Bubble Sort algorithm (since we’re comparing i and i + 1), so we know that if we don’t perform any swaps during the iteration then the set must be sorted.

The Example

As per usual we will use the same initial data set that we used for the Bubble Sort and Cocktail Sort to aid in highlighting the differences.

The initial set looks like this:

33 98 74 13 55 20 77 45 64 83

We start by setting our gap value to 10, which is the size of the set. We then shrink the gap by the shrink factor, 1.3, which results in the value of 7 (when rounded down).

The first iteration starts at the beginning of the set, comparing the 1st item with the item that is ”gap” elements further up the set (the 8th item). These two items are shown in red:

33 98 74 13 55 20 77 45 64 83

33 is less than 45, so no swap is required. We then move to the 2nd item, and compare that item with the 9th item:

33 98 74 13 55 20 77 45 64 83

We perform a swap at this point because 98 is bigger than 64. The swap is shown in blue:

33 64 74 13 55 20 77 45 98 83

We then move on to compare the 3rd and 10th elements:

33 64 74 13 55 20 77 45 98 83

Again, no swap is required here as the numbers are in the right order.

At this point we have completed our first iteration as we’ve reached the end of the set. Given that the gap value is not 1, we know we still have more iterations to go. So we need to recalculate our gap value by dividing it again by the shrink factor, resulting in a value of 5 (after rounding down).

We then start another iteration from the beginning of the set, using the gap value of 5 as the offset. We start with the 1st and 6th items:

33 64 74 13 55 20 77 45 98 83

Given 33 is bigger than 20, we swap:

20 64 74 13 55 33 77 45 98 83

Then we compare the 2nd and 7th items:

20 64 74 13 55 33 77 45 98 83

No swap required, move to the 3rd and 8th items:

20 64 74 13 55 33 77 45 98 83

Perform the swap since 74 is bigger than 45:

20 64 45 13 55 33 77 74 98 83

We then compare the 4th and 9th items:

20 64 45 13 55 33 77 74 98 83

No swap required, compare the 5th and 10th items:

20 64 45 13 55 33 77 74 98 83

Again, no swap required. We’ve hit the end of this next iteration, and since gap doesn’t equal 1, we recalculate (resulting in the value of 3) and go again.

This time I’ll just show a summary of the steps:

20 64 45 13 55 33 77 74 98 83 13 64 45 20 55 33 77 74 98 83 13 64 45 20 55 33 77 74 98 83 13 55 45 20 64 33 77 74 98 83 13 55 45 20 64 33 77 74 98 83 13 55 33 20 64 45 77 74 98 83 13 55 33 20 64 45 77 74 98 83 13 55 33 20 64 45 77 74 98 83 13 55 33 20 64 45 77 74 98 83 13 55 33 20 64 45 77 74 98 83

Again, we iterate the value of gap, which becomes 2, and we go again:

13 55 33 20 64 45 77 74 98 83 13 55 33 20 64 45 77 74 98 83 13 20 33 55 64 45 77 74 98 83 13 20 33 55 64 45 77 74 98 83 13 20 33 55 64 45 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83

Finally, we reduce the value of gap one more time, resulting in the value of 1. We’re finally at the Bubble Sort stage, and hence the last iteration looks like this:

13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 64 55 77 74 98 83 13 20 33 45 55 64 77 74 98 83 13 20 33 45 55 64 77 74 98 83 13 20 33 45 55 64 77 74 98 83 13 20 33 45 55 64 74 77 98 83 13 20 33 45 55 64 74 77 98 83 13 20 33 45 55 64 74 77 83 98 13 20 33 45 55 64 74 77 83 98

At this point we know, due to observation, that the list is sorted. Unfortunately, the algorithm is yet to know. It looks at the value of gap and sees that it is set to 1, but during the last iteration there were swaps – so the code will iterate again.

At the end of this iteration, given that there were no swaps, it will know that the list is sorted and the code will terminate.

The Implementation

Here is a sample implementation written in C#. It is heavily commented in the hope that it will aid in understanding how the algorithm works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/// <summary>
/// The implementation of the standard comb sort algorithm which sorts
/// an array of integers.
/// </summary>
/// <param name="dataSet">Reference to the dataset that is to be soroted.</param>
static void CombSort(int[] dataSet)
{
    // start by using the length/size of the set as the gap.
    int gap = dataSet.Length;

    // loop indefinately.
    while (true)
    {
        // update the gap value so that it shrinks
        // towards 1.
        if (gap > 1)
        {
            gap = (int)((double)gap / SHRINK_FACTOR);
        }

        // do the comb sort with the current gap
        bool swapped = false;
        for (int i = 0; i + gap < dataSet.Length; ++i)
        {
            if (dataSet[i] > dataSet[i + gap])
            {
                Swap(dataSet, i, i + gap);
                swapped = true;
            }
        }

        // if we're down to a gap of 1, and we haven't swapped
        // anything, then we're sorted.
        if (gap == 1 && !swapped)
        {
            break;
        }
    }
}

The Complexity

This is quite surprising. Despite being based on the idea of a Bubble Sort the time complexity is just O(n log n), and space complexity for in-place sorting is O(1). Amazing for such a simple sorting algorithm!

A Note on Performance

What I’m about to say is almost a direct rip-off of Wikipedia ;) But at least I’m being honest!

When you use a shrink factor of 1.3, as recommended, it turns out that there are only 3 ways for the gap sizes to reduce to 1. They are:

1. - 9, 6, 4, 3, 2, 1 2. - 10, 7, 5, 3, 2, 1 3. - 11, 8, 6, 4, 3, 2, 1

According to Wikipedia, it turns out that the 3rd option is the only one which manages to kill all turtles before the gap becomes 1. I haven’t been able to find anything that backs this up though.

Assuming that this is true, we can create a modified version of the Comb Sort which has this extra little check:

1
2
3
4
if(gap == 9 || gap == 10)
{
    gap = 11;
}

This is a little hack which makes sure that we get the third option before we reduce the gap to 1. Of course, this would only have an effect on sets of data which have more than 11 elements.

Other Implementations

I’m slowly gathering a collection of implementations of all the sorting algorithms, including the ones listed above, that I’m covering in this series and posting them up on BitBucket. The Comb Sort implementations can be found here.

Note: You’ll need Mercurial if you want to pull directly from the repository, otherwise you’ll have to copy and paste from the web.

References and Other Information

  1. Wikipedia - Comb sort

Next up, we’ll be looking at the Gnome Sort.

Disclaimer

I’m no expert, nor an authority. The post above is based on my experience and a bit of research. If you find something wrong with anything I’ve said please let me know. Comments and feedback are greatly appreciated.

Note: For those reading this article in an RSS reader, you may find the colours do not appear in the examples properly. For some reason the feed is stripping out some of the formatting. I will do my best to fix this up soon.

Comments