Sort Home | Count Sort, Quick Sort, SQL |
Source Code | History |
Since computer antiquity, the Quick-sort has been the fastest single-buffered sorting method. Now there is a Count-sort that uses a single buffer like the Quick-sort, and is faster than the Quick-sort. Now programmers have a choice. These comments are intended to help evaluate the two methods.
Let's start with the Quick-sort's biggest advantage, flexibility. One small piece of code applies to many situations. All the application programmer needs to do is supply a comparison function and the quick sort code does the rest. It's not clear how that would work for the Count-sort. Since the Count-sort doesn't use comparison, such things as levels of indirection and selection of keys can't be built into the comparison routine. Another way is needed to supply this information. I'm not saying this is a difficult problem. It just hasn't been done. The convention hasn't been established.
On the other hand, there are situations where the sort order needs to be changed from ASCII to some other order. That is much easier to do with the Count-sort than the Quick-sort. A case insensitive Count-sort runs faster than a case sensitive Count-sort. The overhead of the extra level of indirection is trumped by the smaller histograms. An arbitrary sort order can be implemented in the Count-sort simply by switching in a different 256 byte table.
Another Quick-sort advantage is that it is the only algorithm needed to complete the sort. On existing hardware, the Count-sort must be combined with some other technique. It is possible to design hardware where that wouldn't be true, but that has not been done. Existing hardware has already been optimized for the Quick-sort, but further optimization is possible for the Count-sort.
The count sort is not effecient on sparse data. The first pass of the sort makes a histogram of the keys in the data. If there are a lot of zero entries in the histogram, the count sourt wastes time searching for example, in sorting 10,000,000 32 bit integers, the count sort searches for histogram with data in it 3,151,377 times and on average it has to search 112.04 histogram entries before it finds one with data in it. Of course, that is because the agorithm is dealing with more and more sparse entries as it proceeds into the sort.
But a hardware accelated verions of the count sort would use a 256 bit priority encoder, which should speed this search process up by about a factor of 100, since it would be able to find the next non-empty histogram entry in about one clock pulse. Interestingly, a 65,536 bit priority encorder should only take about 2 clock pulse, and would probaly result in an least a 2x speed inprovement. Also the hardware accelerated version of this algorithm would eliminate the need for a compare and branch as each histogram entry was processed, thus providing an algorithm with no branches.
The big Count-sort advantage is that is is faster. My benchmark showed it to be 60% to 300% faster, but then my code could certainly be improved. The Quick-sort code is already established and well optimized.
Note that from here on we are assuming a Count-sort using a 256 entry histogram. That number is not written in stone, but we are using it for illustration purposes. It is also the number that would probably be the most used in practice.
Aside from speed another major Count-sort advantage is that there are many situations where results can be produced without completely sorting the data set. As an example, let's use this SQL query:
SELECT field1 FROM table ORDER BY field1 LIMIT 500000,10
Suppose the table has a million records. We are asking the database engine to sort all those records and give us the ten records following the five hundred thousandth record in that sort order.
Using the Quick-sort, all these records have to be completely sorted before the result can be given. Using the Count-sort we can do two passes through the data and narrow down the leading digit of the result to one candidate in most situations. Then we can just sort the records that start with that digit and further narrow the candidate field by reiterating the procedure.
Obviously, determining how much faster this is going to be than sorting all the records completely is not possible without knowing the exact details of the type of data and its distribution, but it seems it would be a lot faster. If the key is numerical, the first byte of the data is evenly distributed, and the selected records do not have two or more different first bytes then 255/256 of the records only have to be processed twice. If the selected records have two different first bytes, the above ratio falls to 127/128 which is still pretty good odds. Note that in this case, the larger the data set, the fewer times on average each record needs to be processed.
Actually, we can build the first pass of the Count-sort right into the SELECT query. As the records are pulled in from main memory or disk, it takes little code and almost no time to increment the appropriate bin of a 256 word table sitting in a processor cache.
In addition to not having to sort all the data for the SELECT query with a small LIMIT clause, we don't have to pull all the data into ram so it can be sorted. This kind of query can use less than 100th of the memory as the same query done with the Quick-sort.
In most cases when data needs to be sorted, it is either because we want to select a subset of the data, or we are preparing the data for iterative searches. If we are want to select all records and process each one, then there is no need for sorting.
We have already pointed out the advantage of the "subset" case above. The Count-sort also offers advantages for repetitive searching of a data set. Even at worst case, finding a particular record in a set of sorted data is relative fast. Using a binary search, finding any particular piece of data in 2 to N records requires, at most, N operations. Finding a particular set of data in 1,048,756 records requires 20 operations.
But the Count-sort provides a table of locations for us to search as a byproduct of its operation. So if we save only 256 words of data from the sort, we can narrow our search to 1/256th of the data in just one operation in contrast to the ½ of the data eliminated for each binary operation. If we want to save 65,792 words of data from the sort, we can narrow our search to 1/65,536th of the data in just two operations in contrast to the ¼ of the data eliminated by two binary operations. In our case of searching
For 1,048,756 records, if we save 65,792 words of data from the Count-sort, we can find any individual record in 6 operations as opposed to 20 operations for the quick-sorted data. This iterative searching requirement often comes up in database work where one table needs to reference another.
In many cases we don't know at the time of the SELECT query whether subsequent queries will call for different subsets of data from the same data set. It would be possible save the histogram(s) from the query and mark which ones were sorted then reuse the information for subsequent queries. In this way, we could sort only sections of the data that were called for by the user.
Modern computer systems often have more than one processor, so the efficiency of a sort method on a multi-processor system is an issue. The Count-sort might have an advantage. Using the Quick-sort, the data can be immediately split into chunks by a master processor. Each processor in the system then sorts its chunk of data. The chunks are merged using a merge sort.
Using the Count-sort, a master processor makes two complete passes through the data, then partitions the data into chunks for processing by independent processors. The chunks are aligned so that no two chunks have data starting with the same byte. When each independent processor has finished its sort, the data is sorted. The merge sort step can be eliminated. I have not been able to benchmark this procedure, but I think it would be instructive to do so. Splitting up the data is certainly simpler in the Quick-sort case, but the Count-sort is able to eliminate the merge sort part of the process.
Finally, we need to consider the ill-ordered data problem of the Quick-sort. If the Quick-sort is fed a statistically unlikely set of data, it's speed falls dramatically and becomes comparable to that of much slower sorting methods. A user who encountered this type of data would likely think his computer crashed. Thisdoes not appear to be much of a practical problem, though I don't know of any data on how much of a problem it is in real situations. The Count-sort does not suffer from this problem at all.
Even if the ill-ordered data problem is not statistically significant, it is enough to persuade some programmers not to use the Quick-sort, but use the heap sort instead, even though it is slower. The ill-ordered data problem is an aesthetic problem for the Quick-sort, and using a sort that did not suffer from this aesthetic defect could increase customer confidence in software.