InspireIndexing

INSPIRE Parallel Indexing

The parallel indexing module is designed for indexing very large collections of full text documents (hundreds of thousands or millions) The implementation uses Hadoop and Lucene libraries and is meant to be executed on a Hadoop facility. The input documents as well as computed indexes are located on the Hadoop DFS. The arguments for an indexing job:

Input directory
Output directory
number of workers

InspireIndexing

INSPIRE Parallel Indexing

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools