A Card Game

In this class, if you aren't writing code you are playing cards. That's because most of these algorithms, even complex ones, can often be seen being used subconsciously in everyday life. Not only computers would like to finish their work early!

Consider sorting half a deck of cards. We use the half of deck including all cards from the red diamond suit and the black clubs suit. Let's order them: any red comes before any black, ace is one, and the pictures cards Jack, Queen, King are 11, 12, 13, as is typical.

How I would sort these cards is to first make two piles, one of red and the other of black; sort the piles individually; and complete the task by placing the pile of sorted red cards on top of the pile or sorted black cards. Lets look mathematically at what has happened.

First, I looked through 26 cards and placed 13 red in one pile and 13 black in another pile. The cost of this is about 26 units of activity.
Second, I sorted each pile, using an n²/2 sort, for a total of the two decks of 168 units of activity.
Third, I joined to the two decks at a cost of one unit of actitivty.

To sort 26 cards by an n²/2 sort would cost 338 units of activity. The grand total for the above proces is 196 units of activity. A significant reduction in work. The reason for that is since a square function is convex upwards, cutting the size of the problem in half reduces the amount of work by strictly more than half. It takes less effort to solve twice as many problems of half the size compare to solving the entire problem at once. However, is two separate small solutions good enough? In our case it was, because first we had a cheap method to separate into two problems and second we had a cheap method to join the sperate problem into the solution of the original problem.

PSEUDOCODE FOR SORTING WITH SUITS

Split into red and black suits
Sort red suit
Sort black suit
Deck is sorted

Shuffle

Split

Sort red

Sort black

Finished

The principal is: two problems of half size are, even in total, easier than the one original problem; and the two half solutions can be brought together quick enough to so solve the full solution without spending away all the savings gained.

If splitting once is good, then maybe splitting again, and again, is better. In these recursive n log n sorts, each sub-problem is split into further sub-problems, until the sub-problem is small enough to be solved (sorted) with very little effort, a constant amount of effort. Often the splitting continues until there is only a one card to sort.

So how quick is quick when it comes to the job of separation into subproblems and joining the subproblems back into a full solution? A work proportional to the size of the problem will do. In our example, the separation was achieved by passing through the n cards and performing a simple amount of work on each card, for a work of something proportional to n. In our example the job of bringing the cards together was almost no work. In Mergesort the job of bringing the subproblems together will cost a work factor portional to n, where n is the number of cards. The question then becomes:

Is 2 ((n/2)²/2) + c₁n + c₂ n < n²/2 ? Or equivalently, are there values for n for which 0 < n/4 - c₁ - c₂?

The answer is, yes, for large enough n the inequality is true. Such schemes to divide a problem down and reassemble them will be advantageous, no matter how complicated the separation and reassembly procedures might be, provided they are in essence linear, and that the problem size is large enough.

Quicksort and mergesort are sorts that work according to the cheaper by halves principal. In quicksort, the dataset is split into small and large values, and each is separately sorted. The entirety is sorted by simply bringing together the two sorted halves. Here we placed the reds (small) separate for the blacks (large). In quicksort, finding a place to split is not as clear. A pivot item is selected at random from the set of numbers to sort, and its value becomes the boundary between values considered large and those considered small. Often the split is then not exactly in the middle, since the pivot need not be the median value of the set.

Mergesort operates by:

Spliting the dataset exactly down the middle
Sorting each half
Merging the two sorted halfs into a sorted whole

Step one takes no time, but step three takes linear time. Same as quicksort but the linear time is spent at the end of the procedure, rather than at the beginning.

Addendum

Added August, 2009.
What sort of algorithms benefit from dividing the problem into several smaller fragments? It must be that one gains more by solving two halves than loses to the business of dividing into and recombining the halves. In the simplest case:

2 (n/2)² + n < n²

is true for large enough n, so the scheme works. Would it work for, say, finding the minimum? Finding the minimum works in linear time. One would divide into halves, find the minimum in each half, and with one unit of time compare the minimum of each half to find the overall minimum,

2 (n/2) + 1 >?< n

If anything the divided problem is slower.

How about n log n? We sort of know that n log n is very good for a sorting algorithm, at least we are presented such algorithms and no one tries to go on and do better. What would dividing into small problems do for a program with growth n log n?

2 ((n/2) log (n/2) ) + n = n ( log n - log 2 ) + n = n log n - n + n = n log n.

Ok, that's interesting. For the mathematical model of dividing and solving, n log n is a sort of boundary. For n², there is a gain by dividing, for n, there is a small loss. But for n log n it is precisely balanced.

Cheaper by Halves

A Card Game

Addendum