tools/embed.cpp File Reference

Example program of set embedding with random histograms. More...

#include <boost/program_options.hpp>
#include <boost/foreach.hpp>
#include <lshkit.h>

Namespaces

namespace  std

Classes

class  Embedder
 The abstract embedder. More...
class  StripeEmbedder
 Wrapper of the histogram embedder. More...
struct  StripeEmbedder::Parameter
class  HyperPlaneEmbedder
 Wrapper of the histogram embedder. More...
struct  HyperPlaneEmbedder::Parameter

Defines

#define EMBEDDER_STRIP   1
#define EMBEDDER_RP   2

Typedefs

typedef lshkit::Repeat< lshkit::LSB<
lshkit::GaussianLsh > > 
StripeLsh
 The Stripe LSH. See Section 4.1 of MM08 paper.
typedef lshkit::Histogram<
StripeLsh
StripeEmbedderBase
typedef lshkit::Repeat< lshkit::HyperPlaneLshHyperPlaneLsh
 Random hyperplane LSH. See Section 4.2 of MM08 paper.
typedef lshkit::Histogram<
HyperPlaneLsh
HyperPlaneEmbedderBase

Functions

int main (int argc, char *argv[])


Detailed Description

Example program of set embedding with random histograms.

This program implements the two random histogram embedding methods proposed in the random histogram paper by W. Dong et al.

The program reads feature sets from an input text file. The file is of the following format

 ID   N                   // ID is the identifier of the set, N is the number of features in the set
 weight   D1  D2  ...     // a weight followed by D dimensions, 1st feature
 weight   D1  D2  ...     // 2nd feature
 ...
 weight   D1  D2  ...     // Nth feature
 ID   N                   // another set
 weight   D1  D2  ...
 
ID is string which cannot contain space characters; N is positive integer; weight and the dimension values are floats.

The program embedds the input sets into single feature vectors and output them in the following format

 ID D1  D2 ...              // The input IDs are copied to the output
 ID D1  D2 ...              // Following is the histogram, whose dimensionality is determined by the
 ...                        // input parameters.
 

The user is encouraged to modify this program to customize the input and output format.

Usage:

Allowed options:
  -h [ --help ]            produce help message.
  -t [ --type ] arg (=1)   embedding algorithm:
                           1 - stripe embedding [-B, -M, -N, -W],
                           2 - random hyperplane [-B, -M, -N].

  --norm                   normalize the output vector to unit length.
  -I [ --input ] arg (=-)  input file.
  -O [ --output ] arg (=-) output file.
  -D [ --dim ] arg         input dimension.
  -B [ -- ] arg (=8)       #bits per projection.
  -M [ -- ] arg (=1)       take the sum of M.
  -N [ -- ] arg (=10)      repeat N times.
  -W [ -- ] arg (=1)       for type 1 only, LSH window size.
 

Get LSHKIT at SourceForge.net. Fast, secure and Free Open Source software downloads doxygen