Forward Index

From Gcube Wiki
Jump to: navigation, search

Alert icon2.gif THIS SECTION OF GCUBE DOCUMENTATION IS CURRENTLY UNDER UPDATE.

Introduction

The Forward Index is responsible for the capability of storing and retrieving key and value pairs. The values can be retrieved by indicating an interval for the keys. The Forward Index Service design pattern is similar to/the same as the Full Text Index Service design and the Geo Index Service design. The forward index supports the following schema for the key value pair:

key; integer, value; string

key; float, value; string

key; string, value; string

key; date, value;string

The schema for an index is given as a parameter when the index is created. The schema must be known in order to be able to instantiate a class that is capable of comparing the two keys (implements java.util.comparator). The Objects stored in the database can be anything. There is no limit to the length of the keys or values (except for the typed keys).

Implementation Overview

Services

The forward index is implemented through three services. They are all implemented according to the factory-instance pattern:

  • An instance of ForwardIndexManagement Service represents an index and manages this index. The life-cycle of the index is the same as the life-cycle of the management instance; the index is created when the ForwardIndexManagement instance is created, and the index is terminated (deleted) when the ForwardIndexManagement instance resource is removed. The ForwardIndexManagement Service manage the life-cycle and properties of the forward index. It co-operates with instances of the ForwardIndexUpdater Service when feeding content into the index, and with instances of the ForwardIndexLookup Service for getting content from the index. The Content Management service is used for safe storage of an index. A logical file is established in Content Management when the index is created. The index is retrieved from Content Management and established on the local node when an existing forward index is dynamically deployed on a node. The logical file in Content Management is deleted when the ForwardIndexManagement instance is deleted.
  • The ForwardIndexUpdater Service is responsible for feeding content into the forward index. The content of the forward index consists of key value pairs. A ForwardIndexUpdater Service resource updates a single Index. One index may be updated by several ForwardIndexUpdater Service instances simultaneously. When feeding the index, a ForwardIndexUpdater Service is created, with the EPR of the FullTextIndexManagement resource connected to the Index to update. The ForwardIndexUpdater instance is connected to a ResultSet that contains the content to be fed to the Index.
  • The ForwardIndexLookup Service is responsible receiving queries for the index, and returning responses that matches the queries. The ForwardIndexLookup gets a reference to the ForwardIndexManagement instance that is managing the index, when it is created. It can only query this index. Several ForwardIndexLookup instances may query the same index. The ForwardIndexLookup instances gets the index from Content Management, and establishes a local copy of the index on the file system that is queried. The local copy is kept up to date by subscribing for index change notifications that are emitted my the ForwardIndexManagement instance.

It is important to note that none of the three services have to reside on the same node; they are only connected through web service calls and the DILIGENT Content Management System.

RowSet

The content to be fed into an Index, must be served as a ResultSet containing XML documents conforming to the ROWSET schema. This is a simple schema, containing key and value pairs. The following is an example of a ROWSET for that can be fed into the Forward Index Updater:

The row set "schema"

<rowset> <insert> <tuple><key></key><value></value></tuple> <tuple><key></key><value></value></tuple> </insert> <delete> <key></key> <key></key> </delete> </rowset>

The rowset may contain a insert section, or a delete section or both. The key and value pairs (tuples) in the insert section may be repeated 1 or infinite number of times. The key in the delete section may be repeated 1 or infinite number of times.

Test Client ForwardIndexClient

The org.diligentproject.indexservice.clients.ForwardIndexClient test client is used to test the ForwardIndex.

The ForwardIndexClient is in the SVN module test/client The ForwardIndexClient uses a property file ForwardIndex.properties:

The property file contains the following properties: ForwardIndexManagementFactoryResource= /wsrf/services/diligentproject/index/ForwardIndexManagementFactoryService Host=dili02.osl.fast.no ForwardIndexUpdaterFactoryResource= /wsrf/services/diligentproject/index/ForwardIndexUpdaterFactoryService ForwardIndexLookupFactoryResource= /wsrf/services/diligentproject/index/ForwardIndexLookupFactoryService geoManagementFactoryResource= /wsrf/services/diligentproject/index/GeoIndexManagementFactoryService Port=8080 Create-ForwardIndexManagementFactory=true Create-ForwardIndexLookupFactory=true Create-ForwardIndexUpdaterFactory=true

The property Host and Port must be edited to point to VO of interest.

The test client creates the Factory services (gets the EPRs of) and uses the factory services to create the statefull web services:

ForwardIndexManagementService - responsible for holding the list of delta files that

                             in sum is the index. The service also relays Notifications
                             from the ForwardIndexUpdaterService to the ForwardIndexLookupService
                             when new delta files must be merged into the index.

ForwardIndexUpdaterService - responsible for creating new delta files with tuples that shall

                            be deleted from the index or inserted into the index.

ForwardIndexLookupService - responsible for looking up queries and returning the answer.

The test clients creates one WS - resource of each type, inserts some data into the update, and queries the data by using the lookup WS resource.

Inserting data and deleting tuples Tuples can be inserted and deleted by: insertingPair(key,value) / deletingPair(key) -simple methods to insert / delete tuples. process(rowSet) - method to insert / delete a series of tuples. procesResultSet - method to insert / delete a series of tuples in a rowset inserted into a resultSet.

Lookup: Tuples can be queried by : getEQ_int(key), getEQ_float(key), getEQ_string(key), getEQ_date(key) getLT_int(key), getLT_float(key), getLT_string(key), getLT_date(key) getLE_int(key), getLE_float(key), getLE_string(key), getLE_date(key) getGT_int(key), getGT_float(key), getGT_string(key), getGT_date(key) getGE_int(key), getGE_float(key), getGE_string(key), getGE_date(key) getGTandLT_int(keyGT,keyLT), getGTandLT_float(keyGT,keyLT),getGTandLT_string(keyGT,keyLT), getGTandLT_date(keyGT,keyLT) getGEandLT_int(keyGE,keyLT), getGEandLT_float(keyGE,keyLT),getGEandLT_string(keyGE,keyLT), getGEandLT_date(keyGE,keyLT) getGTandLE_int(keyGT,keyLE), getGTandLE_float(keyGT,keyLE),getGTandLE_string(keyGT,keyLE), getGTandLE_date(keyGT,keyLE) getGEandLE_int(keyGE,keyLE), getGEandLE_float(keyGE,keyLE),getGEandLE_string(keyGE,keyLE), getGEandLE_date(keyGE,keyLE) getAll

The result is provided to the client by using the Result Set service.