`#include <cstdlib>`

`#include <gsl/gsl_multifit.h>`

`#include <boost/program_options.hpp>`

`#include <boost/progress.hpp>`

`#include <lshkit.h>`

## Namespaces | |

namespace | tr1 |

## Functions | |

bool | is_good_value (double v) |

int | main (int argc, char *argv[]) |

This program gahters statistical data from a small sample dataset for automatic MPLSH parameter tuning. It carries out the following steps:

- Sample N points from the dataset. Only those N points will be used for future computation.
- Sample P pairs of points from the sample, calculate the distance for each pair.
- Sample Q points from the sample as queries points.
- Divide the sample into F folds.
- For i = 1 to F, take i folds and run K-NN search, so the query points will be searched against sample datasets of N/F, 2N/F, ..., N/F points.

The statistical data is printed to standard output after the progress display.

Allowed options: -h [ --help ] produce help message. -N [ -- ] arg (=0) number of points to use -P [ -- ] arg (=50000) number of pairs to sample -Q [ -- ] arg (=1000) number of queries to sample -K [ -- ] arg (=100) search for K nearest neighbors -F [ -- ] arg (=10) divide the sample to F folds -D [ --data ] arg data file