next up previous contents index Search
Next: Source Code Up: 0.2 Data Searching and Previous: References

0.2.5 B+-Trees

A B+-tree is a data structure to store vast amounts of information. Typically B+-trees are used to store amounts of data that will not fit in main system memory. To do this, secondary storage (usually disk) is used to store the leaf nodes of the tree. Only the internal nodes of the tree are stored in computer memory. In a B+-tree the leaf nodes are the only ones that actually store data items. All other nodes are called index nodes or i-nodes       and simply store ``guide'' values which allow us to traverse the tree structure from the root down and arrive at the leaf node containing the data item we seek. Because disk I/O is very slow in comparison to memory access these leaf nodes store more than one data item each. In fact, the data structure will perform best when the size of the leaf nodes closely approximates the size of a disk sector under most operating systems. Thus, when we search a B+-tree (by traversing from the root node down to the proper data node) we still must read that data node from the disk and search its contents. Another way to improve the speed of a query operation is to keep a memory cache of recently read leaf nodes.

The ancestor of the B+-tree is a structure known as a B-tree in which data items can be stored in any node on the tree. A more complicated and slightly more robust varient of the B-tree is called a B*-tree.    

While conceptually simple in nature, the challenging aspect of B+-trees is effecting their growth and collapse. When data nodes in the tree become too full they split into two nodes. This process demands that a new guide value be added to the parent index node so that future queries can arbitrate between the two new nodes. However, adding this new guide value to the parent may cause it, in turn, to split. In fact, all the nodes on a path from a leaf to the root may split when you add a new value to a leaf node. If the root node splits, a new leaf node is created and your tree grows by one level.

Conversely, the deletion of data items in a leaf node may cause that node to become too empty. When a data node becomes too empty, neighboring nodes are examined and merged into underfull node. This causes the deletion of a guide value in the parent index node which, in turn, may cause it to become too empty. Much like the ripple caused by an insertion operation, a key deletion may cause a merge-delete wave to run from a leaf node all the way up to the root. When the children of the root are merged into one and the root node's only value is deleted the root index node is deallocated, its child node becomes the new root, and the tree shrinks by one level.

As you can see, insertion and deletion processes are recursive in nature and can cascade up or down the B+-tree affecting its shape dramatically.

Scott Gasch