NIST

Jaro-Winkler

(algorithm)

Definition: A measure of similarity between two strings. The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters, then rescaled it by a piecewise function, whose intervals and weights depend on the type of string (first name, last name, street, etc.).

Generalization (I am a kind of ...)
string matching with errors.

See also Levenshtein distance, phonetic coding.

Note: For "piecewise function", see the definition in MathWorld or answers from Dr. Math.

Author: PEB

Implementation

Cohen, Ravikumar, and Fienberg have an implementation in their SecondString (Java) package.

More information

William E. Winkler and Yves Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C., 1991. The abstract (HTML) and full paper (PDF).
Matthew A. Jaro, Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, 84(406):414-420, June 1989.
Matthew A. Jaro, UNIMATCH: A Record Linkage System: User's Manual, Technical Report, U.S. Bureau of the Census, Washington, D.C., 1976.


Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 27 May 2014.
HTML page formatted Mon Feb 2 13:10:39 2015.

Cite this as:
Paul E. Black, "Jaro-Winkler", in Dictionary of Algorithms and Data Structures [online], Vreda Pieterse and Paul E. Black, eds. 27 May 2014. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/jaroWinkler.html