#include <lsh.h>
Public Types | |
typedef const float * | Domain |
Public Member Functions | |
template<typename RNG> | |
void | reset (const Parameter ¶m, RNG &rng) |
template<typename RNG> | |
ThresholdingLsh (const Parameter ¶m, RNG &rng) | |
unsigned | getRange () const |
unsigned | operator() (Domain obj) const |
unsigned | operator() (Domain obj, float *delta) const |
template<class Archive> | |
void | serialize (Archive &ar, const unsigned int version) |
Classes | |
struct | Parameter |
This LSH can be used to approximate L1 distance for closed D-dimensional space [min, max]^D. It hashes each input vector into a 0-1 value, so its range is 2. The hash function is the following: a random dimension is chosen, and a threshold T is sampled uniformly in [min, max]. For each input vector, the value at that dimension is check. If the value is larger than T, 1 is returned and otherwise 0 is returned. Following is the parameter.
struct Parameter { unsigned dim; // Dimension of domain. float min; // Each dimension is limited in [min, max]. float max; };
The method is discussed in the following papers:
Zhe Wang, Wei Dong, William Josephson, Qin Lv, Moses Charikar, Kai Li. Sizing Sketches: A Rank-Based Analysis for Similarity Search. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems . San Diego, CA, USA. June 2007.
Qin Lv, Moses Charikar, Kai Li. Image Similarity Search with Compact Data Structures. In Proceedings of ACM 13th Conference on Information and Knowledge Management (CIKM), Washington D.C., USA. November 2004.
Note that the original method allows the range of each dimension to be different and also allow each dimensioin to carry a weight. The implementation here is simplified.