string matching with errors


Definition: Searching for approximate (e.g., up to a predefined number of symbol mismatches, insertions, and deletions) occurrences of a pattern string in a text string. Preprocessing, e.g., building an index, may or may not be allowed.

Also known as approximate string matching.

Specialization (... is a kind of me.)
Ratcliff/Obershelp pattern recognition, phonetic coding, Jaro-Winkler, Levenshtein distance, string matching with mismatches.

See also string matching, inverted index.

Note: From Algorithms and Theory of Computation Handbook, page 13-18, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.

For large collections that are searched often, it is far faster, though more complicated, to start with an inverted index or a suffix tree.

Author: CRC-A


See implementations at string matching. (C and Pascal)

More information

Qi Xiao Yang, Sung Sam Yuan, Lu Chun, Li Zhao, and Sun Peng, Faster Algorithm of String Comparison 2001.

Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 2 September 2014.
HTML page formatted Mon Feb 2 13:10:40 2015.

Cite this as:
Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "string matching with errors", in Dictionary of Algorithms and Data Structures [online], Vreda Pieterse and Paul E. Black, eds. 2 September 2014. (accessed TODAY) Available from: