next up previous contents index Search
Next: 0.3.1.2 Partitioning Up: 0.3.1 The Quicksort Previous: 0.3.1 The Quicksort

0.3.1.1 Picking a Pivot Value

As you might imagine, selecting a good pivot value is crucial to the success and performance of this sort. In the ideal case we want to pick the statistical median key in a partition as the pivot value and, thus, split the partition into equal halves.  

The simplist way to pick a pivot value is to use the value of first data item. This method has the advantage of being a very fast but, unfortunately, operates on a sometimes faulty supposition. If the data being sorted is in near-sorted order then the first item in a given partition is very likely to have the lowest value of all items in said partition. This leads to unbalanced partitioning of the data set. To avoid infinite recursion usually Quicksort is implemented to partition into a set containing the pivot element and a set containing the n-1 others. However, repeated unbalanced partitioning causes Quicksort to perform very poorly.

In order to eliminate this risk, often the higher of the two first distinct key values in a partition is selected to be the pivot. Other pivot selection schemes add the first and last values in the partition and divide by two. Still others take the middle-of-three approach. While methods such as these must examine slightly more data than one which blindly chooses the first element, normally they speed up the overall algorithm by choosing a value more likely to be near the median of the partition.

Below is an example pivot selecting routine written in C. It uses the greater of the first two distinct values in a partition as the pivot point and returns NONE if the partition does not need to be sorted any further.


/* NONE must be a value that cannot occur as a key */
#define NONE -1

/* these are arbitrary */
typedef key int;
typedef data struct {
  key thekey;
  char *therest;
};

/*
 *  Return the pivot value or NONE if the partition does
 *  not need to be sorted any further.
 *
 */

key selectpivot(data *array, int left, int right) {

  key first = value(array[left]);         /* the first key */
  int lcv;                                 /* loop control */

  for (lcv = left + 1; lcv <= right; lcv++) {
    if (value(array[lcv]) > first) return (value(lcv));
    else if (value(array[lcv]) < first) return first;
  }

  /* if we get here the partition is homogeneous */
  return (NONE);
}

key value(data *item) {
  return (item->thekey);
}

Above, the routine selects the value of a pivot point and returns it, or, if all the elements in the partition are equal in value, returns NONE to indicate that further sorting of this partition is not necessary. NONE must be a value that will not appear in the array. If you cannot predict which values will appear in your data set, it would be better to write the above routine to return the index of the pivot value rather than the value itself. Then it could return an invalid index number to specify a homogeneous partition.

Scott Gasch
1999-07-09