locality-sensitive hashing

(algorithm)

Definition: A probabilistic algorithm to quickly find points in a high dimensional space near a query point. Preprocessing: put every point in multiple hash tables. Each table has its own locality-sensitive hash function and uses buckets (or chaining) since many collisions are expected. The hash functions come from a family of functions. Finding: look up the query point in each hash table, and compute the distance from the query point of every point in the bucket.

Generalization (I am a kind of ...)
point access method.

Aggregate child (... is a part of or used in me.)
hash function, bucket.

See also Bloom filter.

Note: This algorithm is for high dimensional spaces. For instance, a 1000 x 1000 image can be characterized by a vector in 1,000,000-dimensional space, one dimension for each pixel.

To understand the algorithm, let the "hash functions" be random sets of the coordinates. If many coordinates of a point match the query point, it is likely to be close to the query point, at least closer than those that don't match.

Author: PEB

More information

Alexandr Andoni's Locality-Sensitive Hashing home page. Locality sensitive hashing in Wikipedia.

Alexandr Andoni and Piotr Indyk, Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, CACM, 51(1):117-122, January 2008.

Go to the Dictionary of Algorithms and Data Structures home page.

If you have suggestions, corrections, or comments, please get in touch with Paul Black.

Entry modified 16 May 2008.
HTML page formatted Mon Feb 2 13:10:39 2015.

Cite this as:
Paul E. Black, "locality-sensitive hashing", in Dictionary of Algorithms and Data Structures [online], Vreda Pieterse and Paul E. Black, eds. 16 May 2008. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/localitySensitiveHashing.html