tools/fitdata.cpp File Reference

Gather statistics from dataset for MPLSH tuning. More...

#include <cstdlib>
#include <gsl/gsl_multifit.h>
#include <boost/program_options.hpp>
#include <boost/progress.hpp>
#include <lshkit.h>


namespace  tr1


bool is_good_value (double v)
int main (int argc, char *argv[])

Detailed Description

Gather statistics from dataset for MPLSH tuning.

This program gahters statistical data from a small sample dataset for automatic MPLSH parameter tuning. It carries out the following steps:

  1. Sample N points from the dataset. Only those N points will be used for future computation.
  2. Sample P pairs of points from the sample, calculate the distance for each pair.
  3. Sample Q points from the sample as queries points.
  4. Divide the sample into F folds.
  5. For i = 1 to F, take i folds and run K-NN search, so the query points will be searched against sample datasets of N/F, 2N/F, ..., N/F points.

The statistical data is printed to standard output after the progress display.

Allowed options:
  -h [ --help ]          produce help message.
  -N [ -- ] arg (=0)     number of points to use
  -P [ -- ] arg (=50000) number of pairs to sample
  -Q [ -- ] arg (=1000)  number of queries to sample
  -K [ -- ] arg (=100)   search for K nearest neighbors
  -F [ -- ] arg (=10)    divide the sample to F folds
  -D [ --data ] arg      data file

Get LSHKIT at Fast, secure and Free Open Source software downloads doxygen