Difference between revisions of "Geographical - Spatial Index"

From Gcube Wiki
Jump to: navigation, search
(Creating a Rank Evaluator)
(Creating a Rank Evaluator)
Line 80: Line 80:
 
A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:
 
A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:
  
*abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
 
 
*abstract public void initialize(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin.
 
*abstract public void initialize(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin.
 +
*abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
 
*abstract public double rank(Object entry) -- the method that calculates the rank of an entry.  
 
*abstract public double rank(Object entry) -- the method that calculates the rank of an entry.  
  
Line 111: Line 111:
  
 
public class SpanSizeRanker extends RankEvaluator{
 
public class SpanSizeRanker extends RankEvaluator{
 +
    public void initialize(String[] args) {}
  
 
     public boolean isIndexTypeCompatible(GeoIndexType indexType) {
 
     public boolean isIndexTypeCompatible(GeoIndexType indexType) {
Line 116: Line 117:
 
                 indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
 
                 indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
 
     }     
 
     }     
 
    public void initialize(String[] args) {}
 
 
}
 
}
 
</pre>
 
</pre>
Line 133: Line 132:
  
 
public class SpanSizeRanker extends RankEvaluator{
 
public class SpanSizeRanker extends RankEvaluator{
 +
    public void initialize(String[] args) {}
  
 
     public boolean isIndexTypeCompatible(GeoIndexType indexType) {
 
     public boolean isIndexTypeCompatible(GeoIndexType indexType) {
Line 148: Line 148:
 
         return 1/Math.log10(spanSize);
 
         return 1/Math.log10(spanSize);
 
     }
 
     }
   
 
    public void initialize(String[] args) {}
 
 
      
 
      
 
}
 
}

Revision as of 17:22, 19 June 2007

Services

The geo index is implemented through three services, in the same manner as the full text index. They are all implemented according to the Factory pattern:

  • The GeoIndexManagement Service represents an index manager. There is a one to one relationship between an Index and a Management instance, and their life-cycles are closely related; an Index is created by creating an instance (resource) of GeoIndexManagement Service, and an index is removed by terminating the corresponding GeoIndexManagement resource. The GeoIndexManagement Service should be seen as an interface for managing the life-cycle and properties of an Index, but it is not responsible for feeding or querying its index. In addition, a GeoIndexManagement Service resource does not store the content of its Index locally, but contains references to content stored in Content Management Service.
  • The GeoIndexBatchUpdater Service is responsible for feeding an Index. One GeoIndexBatchUpdater Service resource can only update a single Index, but one Index can be updated by multiple GeoIndexBatchUpdater Service resources. Feeding is accomplished by instantiating a GeoIndexBatchUpdater Service resources with the EPR of the GeoIndexManagement resource connected to the Index to update, and connecting the updater resource to a ResultSet containing the content to be fed to the Index.
  • The GeoIndexLookup Service is responsible for creating a local copy of an index, and exposing interfaces for querying and creating statistics for the index. One GeoIndexLookup Service resource can only replicate and lookup a single instance, but one Index can be replicated by any number of GeoIndexLookup Service resources. Updates to the Index will be propagated to all GeoIndexLookup Service resources replicating that Index.

It is important to note that none of the three services have to reside on the same node; they are only connected through WebService calls and the DILIGENT CMS. The following illustration shows the information flow and responsibilities for the different services used to implement the Geo Index:

(illustration will be improved shortly... )

			 ________________________________
			|				 |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|    So Pretty Index Design...   |
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘•∘|
			|________________________________|

RowSet

The content to be fed into a Geo Index, must be served as a ResultSet containing XML documents conforming to the GeoROWSET schema. This is a very simple schema, declaring that an object (ROW element) should containan id, start and end X coordinates (x1-mandatory and x2-set to equal x1 if not provided) as well as start and end Y coordinates (y1-mandatory and y2-set to equal y1 if not provided). In addition, and of any number of FIELD elements containing a name attribute and information to be stored and perhaps used for refinement of a query or ranking of results. As opposed to the ROWSETs used for fulltext indices, all rows in a GeoROWSET must contain all fields specified in the IndexType. The following is a simple but valid GeoROWSET containing two objects:

<ROWSET>
    <ROW id="doc1" x1="4321" y1="1234">
        <FIELD name="StartTime">2001-05-27T14:35:25.523</FIELD>
        <FIELD name="EndTime">2001-05-27T14:38:03.764</FIELD>
    </ROW>
    <ROW id="doc1" x1="1337" x2="4123" y1="1337" y2="6534">
        <FIELD name="StartTime">2001-06-27</FIELD>
        <FIELD name="EndTime">2001-07-27</FIELD>
    </ROW>
</ROWSET>

GeoIndexType

Which fields should be present in the RowSet, and how these fields are to be handled by the Geo Index is specified through a GeoIndexType; an XML document conforming to the GeoIndexType schema. Which GeoIndexType to use for a specific GeoIndex instance, is specified by supplying a GeoIndexType ID during initialization of the GeoIndexManagement resource. A GeoIndexType contains a field list which contains all the fields which should be stored in order to be presented in the query results or used for refinement. The following is a possible IndexType for the type of ROWSET shown above:

    <index-type>
        <field-list>
            <field name="StartTime">
                <type>date</type>
                <return>yes</return>
            </field>
            <field name="EndTime">
                <type>date</type>
                <return>yes</return>
            </field>
        </field-list>
    </index-type>

Fields present in the ROWSET but not in the IndexType will be skipped. Fields present in the IndexType but not in a ROW in the ROWSET will cause an exception. The two elements under each "field" element are used to define that field should be handled. The meaning and expected content of each of them is explained bellow:

  • type specifies the data type of the field. Accepted values are:
    • SHORT - A number fitting into a Java "short"
    • INT - A number fitting into a Java "short"
    • LONG - A number fitting into a Java "short"
    • DATE - A date in the format yyyy-MM-dd'T'HH:mm:ss.s where only yyyy is mandatory
    • FLOAT - A decimal number fitting into a Java "float"
    • DOUBLE - A decimal number fitting into a Java "double"
    • STRING - A string with a maximum length of 40 (or so...)
  • return specifies whether the field should be returned in the results from a query. "yes" and "no" are the only accepted values.

Plugin Framework

As explained in the GeoIndexType section, which fields a GeoIndex instance should contain can be dynamically specified through a GeoIndexType provided during GeoIndexManagement initialization. However, since new GeoIndexTypes can be added at any time with any number of new fields, there is no way for the GeoIndex itself to know how to use the information in such fields in any meaningful manner when processing a query; a static generic algorithm for processing such information would drastically limit the usefulness of the information. In order to allow for dynamic introduction of field evaluation algorithms capable of handling the dynamic nature of IndexTypes, a plugin framework was introduced. The framework allows for the creation of GeoIndexType-specific evaluators handling ranking and refinement.

DIS plugin information...

Ranking

The results of a query are sorted according to their rank, and their ranks are also returned to the caller. A RankEvaluator plugin is used to determine the rank of objects. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to return a meaningful rank of each object.

Refinement

The GeoIndex uses TwoStep processing in order to process a query. Firstly, a very efficient filtering step will all possible hits (along with some false hits) using the minimal bouning rectangle (mbr) of the query region. Then, a more costly refinement step will use additional object and query information in order to eliminate all the false hits. While the filtering step is handled internally in the index, the refinement step is handled by a refiner plugin. It is provided with the query region, Object data, GeoIndexType and an optional set of plugin specific arguments, and is expected to use this information in order to determine whether an object is whithin a query or not.

Creating a Rank Evaluator

A RankEvaluator plugin has to extend the abstract class org.diligentproject.indexservice.geo.ranking.RankEvaluator which contains three abstract methods:

  • abstract public void initialize(String args[]) -- a method called during the initiation of the RankEvaluator plugin, providing the plugin with any arguments provided in the code. All arguments are given as Strings, and it's up to the plugin to parse the string into the datatype needed by the plugin.
  • abstract public boolean isIndexTypeCompatible(GeoIndexType indexType) -- should be able to determine whether this plugin can be used by an index conforming to the GeoIndexType argument
  • abstract public double rank(Object entry) -- the method that calculates the rank of an entry.


In addition, the RankEvaluator abstract class implements two other methods worth noting

  • final public void init(Polygon polygon, GeoIndexType indexType, String args[]) -- initialized the protected variables Polygon polygon, Envelope envelope and GeoIndexType indexType, before calling initialize() using the last argument. This means that all the three protected variables are available in the initialize() method.
  • protected Object getDataField(String field, Data data) -- a method used to retrieve a the contents of a specific GeoIndexType field from a org.geotools.index.Data object conforming to the GeoIndexType used by the plugin.


Ok, simple enough... So let's create a RankEvaluator plugin. We'll assume that for a certain use case, entries which span over a long period of time are of less interest than objects with span over a short period of time. Since we're dealing with TimeSpans, we'll assume that the data stored in the index will have a "StartTime" field and an "EndTime" field, in accordance with the GeoIndexType shown earlier.

The first thing we need to do, is to create a class which extends RankEvaluator:

package org.mojito.ranking;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;

public class SpanSizeRanker extends RankEvaluator{
    
}

Next, we'll implement the isIndexTypeCompatible method. To do this, we need a way of determine if the fields we need are present in the GeoIndexType argument. Luckily, GeoIndexType contains a method called containsField which expects the String name and GeoIndexField.DataType (date, double, float, int, long, short or string) type of the field in question as arguments. In addition, we'll implement the initialize() method, which we'll leave empty as the plugin we are creating doesn't need to handle any arguments.

package org.mojito.ranking;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;

public class SpanSizeRanker extends RankEvaluator{
    public void initialize(String[] args) {}

    public boolean isIndexTypeCompatible(GeoIndexType indexType) {
        return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
    }    
}

Last, but not least... We need to implement the Rank() method. This is of course the method which calculates a rank for an entry, based on the query polygon, any extra arguments and the different fields of the entry. In our implementation, we'll simply calculate the timespan, and devide 1 by this number in order to get a quick and dirty rank. Keep in mind that this method is not called for all the entries resulting from the R-Tree filtering step, but only a subset roughly fitting the resultset page size. This means that somewhat computationally heavy operation can be performed (if needed) without drastically lowering response time. Please also note how the getDataField() method is used in order retrieve the evaluated fields from the entry data, and how the result is cast to Long (even though we are dealing with dates). The reason for this is that the GeoIndex internally represents a date as a long containing the number of seconds from the Epoch. If we wanted to evaluate the Minimal Bouning Rectangle (MBR) of the entries, we could access them through entry.getBounds().

package org.mojito.ranking;

import org.diligentproject.indexservice.common.GeoIndexField;
import org.diligentproject.indexservice.common.GeoIndexType;
import org.diligentproject.indexservice.geo.ranking.RankEvaluator;
import org.geotools.index.Data;
import org.geotools.index.rtree.Entry;


public class SpanSizeRanker extends RankEvaluator{
    public void initialize(String[] args) {}

    public boolean isIndexTypeCompatible(GeoIndexType indexType) {
        return indexType.containsField("StartTime", GeoIndexField.DataType.DATE) && 
                indexType.containsField("EndTime", GeoIndexField.DataType.DATE);
    }
    
    public double rank(Object obj){
        Entry entry = (Entry)obj;
        Data data = (Data)entry.getData();
        Long entryStartTime = (Long) this.getDataField("StartTime", data);
        Long entryEndTime = (Long) this.getDataField("EndTime", data);
        long spanSize = entryEndTime - entryStartTime;
        
        return 1/Math.log10(spanSize);
    }
    
}


And there we are! Our first working RankEvaluator plugin.

Creating a Refiner

SpanSizeRefiner

Packaging plugins

Will be filled out shortly

loading of plugins

Query language

A query is specified through a SearchPolygon object, containing the points of the vertices of the query region, an optional RankingRequest object and an optional list of RefinementRequest objects. A RankingRequest object contains the String ID of the RankEvaluator to use, along with an optional String array of arguments to be used by the specified RankEvaluator. Similarly, the RefinementRequest contains the String ID of the Refiner to use, along with an optional String array of arguments to be used by the specified Refiner

+ how to specify a rectangle

Dependencies

Will be filled out shortly

Usage Example

Create a Management Resource

Create a Updater Resource and start feeding

Create a Lookup resource and perform a query