#include <lshkit/common.h>
#include <lshkit/lsh.h>
#include <lshkit/composite.h>
#include <lshkit/metric.h>
#include <lshkit/lsh-index.h>
#include <lshkit/mplsh-model.h>
#include <lshkit/topk.h>
Go to the source code of this file.
Namespaces | |
namespace | lshkit |
Classes | |
struct | lshkit::Probe |
Probe vector. More... | |
class | lshkit::ProbeSequenceTemplates |
Probe sequence template. More... | |
class | lshkit::MultiProbeLsh |
Multi-Probe LSH class. More... | |
struct | lshkit::MultiProbeLsh::Parameter |
class | lshkit::MultiProbeLshIndex< KEY > |
Multi-Probe LSH index. More... | |
Typedefs | |
typedef std::vector< Probe > | lshkit::ProbeSequence |
Probe sequence. | |
Functions | |
void | lshkit::GenProbeSequenceTemplate (ProbeSequence &seq, unsigned M, unsigned T) |
Generate a template probe sequence. | |
Variables | |
ProbeSequenceTemplates | lshkit::__probeSequenceTemplates |
Multi-Probe LSH (MPLSH) uses the same data structure as LshIndex, except that it probes more than one buckets in each hash table to generate more accurate results. Equivalently, less hash tables are needed to achieve the same accuracy. The limitation is that the current implementation only works for L2 distance.
Follow the following 4 steps to use the MPLSH API.
MPLSH uses the scanner as a unary function taking a key as argument.
The default scanner implementation is TopkScanner.
class Scanner { ... void operator () (unsigned key); };
typedef MultiProbeLshIndex<KEY> Index;
Index index;
The index can not be used yet.
Index::Parameter param; //Setup the parameters. Note that L is not provided here. param.W = W; param.H = H; // See H in the program parameters. You can just use the default value. param.M = M; param.dim = DIMENSION_OF_THE_DATA DefaultRng rng; // random number generator. index.init(param, rng, L); for (each possible key, value pair) { index.insert(key, value); } // You can now save the index for future use. ofstream os(index_file.c_str(), std::ios::binary); index.save(os);
Or you can load from a previously saved file
ifstream is(index_file.c_str(), std::ios::binary); index.load(is);
float *query;
...
index.query(query, T, scanner); cnt is the number of points actually scanned.
See the source file lshkit/tools/mplsh-run.cpp for a full example of using MPLSH.
For adaptive probing, I hard coded the sensitive range of KNN distance to [0.0001W, 100W] and logarithmically quantized the range to 200 levels. If you find that your KNN distances fall outside this range, or want more refined quantization, you'll have to modify the code in lshkit::MultiProbeLshIndex::init().
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). Vienna, Austria. September 2007.