0.2.6 Boyer-Moore String Searching

Search
Next: 0.2.6.1 Source Code Up: 0.2 Data Searching and Previous: 0.2.5.1 Source Code

0.2.6 Boyer-Moore String Searching

The Boyer-Moore searching algorithm, described in R. S. Boyer and J. S. Moore's 1977 paper A Fast String Searching Algorithm is among the best ways known for finding a substring in a search space. Using their method it is possible to search a data space for a known pattern without having to examine all the characters in the search space. Boyer-Moore search algorithms are based on two search heuristics.

The first of these rule tells us how to search for substrings without repeats in a data space. Keep a pointer into the data space at the current search location; initialize this pointer to the start of the space plus n - 1 characters where n is the number of characters in the target string.

Compare the character in the data space pointed to by this pointer with the characters in the target string. If this character does not occur in the target string, advance the pointer by n places.

If the character does occur in the target string, advance the pointer by n - p places where p is the position that the character in question first occurs in the target string.

This process repeats until either a match is found or we have shifted over past the end of the search space.

The second search heuristic applies to searching for targets with repeating patterns. Using only the rules set forth in the first heuristic will work for targets with repeating patterns but the search will not be as efficient as possible. By examining partial matches and repeats in the target string, though, it is possible to make more drastic pointer jumps and arrive at the match more rapidly. This type of jump is based on a table which is computed before the search begins.

Scott Gasch
1999-07-09