NIST

string matching with errors

(algorithm)

Definition: Searching for approximate (e.g., up to a predefined number of symbol mismatches, insertions, and deletions) occurrences of a pattern string in a text string. Preprocessing, e.g., building an index, may or may not be allowed.

Also known as approximate string matching.

Specialization (... is a kind of me.)
Ratcliff/Obershelp pattern recognition, phonetic coding, Levenshtein distance, string matching with mismatches.

See also string matching, inverted index.

Note: From Algorithms and Theory of Computation Handbook, page 13-18, Copyright © 1999 by CRC Press LLC. Appearing in the Dictionary of Computer Science, Engineering and Technology, Copyright © 2000 CRC Press LLC.

For large collections that are searched often, it is far faster, though more complicated, to start with an inverted index or a suffix tree.

Author: CRC-A

Implementation

See implementations at string matching. (C and Pascal)

More information

Faster Algorithm of String Comparison 2001.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.

Entry modified Wed Feb 2 11:33:09 2005.
HTML page formatted Wed Oct 26 09:48:13 2005.

Cite this as:
Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "string matching with errors", from Dictionary of Algorithms and Data Structures, Paul E. Black, ed., NIST.
http://www.nist.gov/dads/HTML/stringMatchwError.html

to NIST home page