A Card Game

In this class, if you aren't writing code you are playing cards. That's because most of these algorithms, even complex ones, can often be seen being used subconsciously in everyday life. Not only computers would like to finish their work early!

Consider sorting half a deck of cards. We use the half of deck including all cards from the red diamond suit and the black clubs suit. Let's order them: any red comes before any black, ace is one, and the pictures cards Jack, Queen, King are 11, 12, 13, as is typical.

How I would sort these cards is to first make two piles, one of red and the other of black; sort the piles individually; and complete the task by placing the pile of sorted red cards on top of the pile or sorted black cards. Lets look mathematically at what has happened.

To sort 26 cards by an n2/2 sort would cost 338 units of activity. The grand total for the above proces is 196 units of activity. A significant reduction in work. The reason for that is since a square function is convex upwards, cutting the size of the problem in half reduces the amount of work by strictly more than half. It takes less effort to solve twice as many problems of half the size compare to solving the entire problem at once. However, is two separate small solutions good enough? In our case it was, because first we had a cheap method to separate into two problems and second we had a cheap method to join the sperate problem into the solution of the original problem.

PSEUDOCODE FOR SORTING WITH SUITS

Split into red and black suits
Sort red suit
Sort black suit
Deck is sorted
 Shuffle   Split   Sort red   Sort black   Finished 

The principal is: two problems of half size are, even in total, easier than the one original problem; and the two half solutions can be brought together quick enough to so solve the full solution without spending away all the savings gained.

If splitting once is good, then maybe splitting again, and again, is better. In these recursive n log n sorts, each sub-problem is split into further sub-problems, until the sub-problem is small enough to be solved (sorted) with very little effort, a constant amount of effort. Often the splitting continues until there is only a one card to sort.

So how quick is quick when it comes to the job of separation into subproblems and joining the subproblems back into a full solution? A work proportional to the size of the problem will do. In our example, the separation was achieved by passing through the n cards and performing a simple amount of work on each card, for a work of something proportional to n. In our example the job of bringing the cards together was almost no work. In Mergesort the job of bringing the subproblems together will cost a work factor portional to n, where n is the number of cards. The question then becomes:

Is 2 ((n/2)2/2) + c1n + c2 n < n2/2 ? Or equivalently, are there values for n for which 0 < n/4 - c1 - c2?
The answer is, yes, for large enough n the inequality is true. Such schemes to divide a problem down and reassemble them will be advantageous, no matter how complicated the separation and reassembly procedures might be, provided they are in essence linear, and that the problem size is large enough.

Quicksort and mergesort are sorts that work according to the cheaper by halves principal. In quicksort, the dataset is split into small and large values, and each is separately sorted. The entirety is sorted by simply bringing together the two sorted halves. Here we placed the reds (small) separate for the blacks (large). In quicksort, finding a place to split is not as clear. A pivot item is selected at random from the set of numbers to sort, and its value becomes the boundary between values considered large and those considered small. Often the split is then not exactly in the middle, since the pivot need not be the median value of the set.

Mergesort operates by:

  1. Spliting the dataset exactly down the middle
  2. Sorting each half
  3. Merging the two sorted halfs into a sorted whole
Step one takes no time, but step three takes linear time. Same as quicksort but the linear time is spent at the end of the procedure, rather than at the beginning.

Addendum

Added August, 2009.
What sort of algorithms benefit from dividing the problem into several smaller fragments? It must be that one gains more by solving two halves than loses to the business of dividing into and recombining the halves. In the simplest case:

2 (n/2)2 + n < n2
is true for large enough n, so the scheme works. Would it work for, say, finding the minimum? Finding the minimum works in linear time. One would divide into halves, find the minimum in each half, and with one unit of time compare the minimum of each half to find the overall minimum,
2 (n/2) + 1 >?< n
If anything the divided problem is slower.

How about n log n? We sort of know that n log n is very good for a sorting algorithm, at least we are presented such algorithms and no one tries to go on and do better. What would dividing into small problems do for a program with growth n log n?

2 ((n/2) log (n/2) ) + n = n ( log n - log 2 ) + n = n log n - n + n = n log n.
Ok, that's interesting. For the mathematical model of dividing and solving, n log n is a sort of boundary. For n2, there is a gain by dividing, for n, there is a small loss. But for n log n it is precisely balanced.