NIST

Aho-Corasick

(algorithm)

Definition: A multiple string matching algorithm that constructs a finite state machine from a pattern (list of keywords), then uses the machine to locate all occurrences of the keywords in a body of text.

Generalization (I am a kind of ...)
string matching.

See also Commentz-Walter works from the back, like Boyer-Moore.

Note: Construction of the machine takes time proportional to the sum of the lengths of the keywords. The machine is used to process the text string in a single pass. The number of transitions made by the machine is independent of the number of keywords.

Author: VO

Implementation

Bruce Watson's SPARE Parts, a string pattern recognition toolkit (C++).

More information

Alfred V. Aho and Margaret J. Corasick, Efficient string matching: an aid to bibliographic search, CACM, 18(6):333-340, June 1975.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 2 September 2014.
HTML page formatted Mon Feb 2 13:10:39 2015.

Cite this as:
Vadim Okun, "Aho-Corasick", in Dictionary of Algorithms and Data Structures [online], Vreda Pieterse and Paul E. Black, eds. 2 September 2014. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/ahoCorasick.html