next up previous contents index Search
Next: Source Code Up: 0.2 Data Searching and Previous: References

0.2.9 Tries

A trie   is a special kind of tree data structure in which items are stored and accessed based upon their key value alone, not based upon their key value in relation to others in the tree. In contrast, recall that a binary search tree   uses a greater-than or less-than relationship between keys in the tree to determine where a new insertion would be placed. In tries, the new insertion's place is predetermined based on its key value. For this reason a trie is somewhat of a cross between a tree and a hash table.  

Tries can only be used when the range of valid keys to be stored is known up-front. To store a new item in a trie, that item's key value is somehow broken down into components. If, for example, the item to be stored is a string, a logical way to break its value down is by letter. For a number, perhaps a good way to break down the key value is by digits in its binary representation. Every internal node in a trie can have as many child nodes as the number of items in the alphabet you are storing. That is, if you are traversing on English letters each node can have up to 26 children. If you are traversing on binary digits, each node can have two children, 0 or 1.

Imagine we have a trie in which we are storing strings. We wish to store the new item ``dog'' in the trie. From the root node we follow the ``d'' edge and arrive at a child node. The only data stored in leaves under this child node are words that begin with the letter ``d.''

Next, we traverse along the ``o'' link from the ``d'' node. We reach yet another internal node under which only words which start with the prefix ``do'' are stored. Finally suppose we find that there is no ``g'' link off the ``do'' node. We create one and store ``dog'' at this new leaf node.

Likewise, if we want to store the number six (6) in a numeric trie, we might convert six to it's binary representation (110). From the first node we follow the ``1'' edge. Next we traverse down the ``1'' edge again. And finally we store the value six in a leaf node that is lies along the ``110'' edges from the root node.

Any path from the root to a leaf in a trie corresponds to the value of the item stored at the leaf node reached.

Huffman trees,       the data structures behind Huffman data compression algorithms, are a popular use of tries. Huffman coding   is discussed in the data compression section of this document.

Scott Gasch