#include <boost/program_options.hpp>
#include <boost/foreach.hpp>
#include <lshkit.h>
Namespaces | |
namespace | std |
Classes | |
class | Embedder |
The abstract embedder. More... | |
class | StripeEmbedder |
Wrapper of the histogram embedder. More... | |
struct | StripeEmbedder::Parameter |
class | HyperPlaneEmbedder |
Wrapper of the histogram embedder. More... | |
struct | HyperPlaneEmbedder::Parameter |
Defines | |
#define | EMBEDDER_STRIP 1 |
#define | EMBEDDER_RP 2 |
Typedefs | |
typedef lshkit::Repeat< lshkit::LSB< lshkit::GaussianLsh > > | StripeLsh |
The Stripe LSH. See Section 4.1 of MM08 paper. | |
typedef lshkit::Histogram< StripeLsh > | StripeEmbedderBase |
typedef lshkit::Repeat< lshkit::HyperPlaneLsh > | HyperPlaneLsh |
Random hyperplane LSH. See Section 4.2 of MM08 paper. | |
typedef lshkit::Histogram< HyperPlaneLsh > | HyperPlaneEmbedderBase |
Functions | |
int | main (int argc, char *argv[]) |
This program implements the two random histogram embedding methods proposed in the random histogram paper by W. Dong et al.
The program reads feature sets from an input text file. The file is of the following format
ID N // ID is the identifier of the set, N is the number of features in the set weight D1 D2 ... // a weight followed by D dimensions, 1st feature weight D1 D2 ... // 2nd feature ... weight D1 D2 ... // Nth feature ID N // another set weight D1 D2 ...
The program embedds the input sets into single feature vectors and output them in the following format
ID D1 D2 ... // The input IDs are copied to the output ID D1 D2 ... // Following is the histogram, whose dimensionality is determined by the ... // input parameters.
The user is encouraged to modify this program to customize the input and output format.
Usage:
Allowed options: -h [ --help ] produce help message. -t [ --type ] arg (=1) embedding algorithm: 1 - stripe embedding [-B, -M, -N, -W], 2 - random hyperplane [-B, -M, -N]. --norm normalize the output vector to unit length. -I [ --input ] arg (=-) input file. -O [ --output ] arg (=-) output file. -D [ --dim ] arg input dimension. -B [ -- ] arg (=8) #bits per projection. -M [ -- ] arg (=1) take the sum of M. -N [ -- ] arg (=10) repeat N times. -W [ -- ] arg (=1) for type 1 only, LSH window size.