Difference between revisions of "IS-Collector"
Manuele.simi (Talk | contribs) (→XML Indexing) |
Manuele.simi (Talk | contribs) |
||
Line 16: | Line 16: | ||
Version 3.0 features a new [http://www.ogf.org/documents/GFD.75.pdf WS-DAIX] compliant interface proving a more general approach to the feeding phase. | Version 3.0 features a new [http://www.ogf.org/documents/GFD.75.pdf WS-DAIX] compliant interface proving a more general approach to the feeding phase. | ||
− | + | == Documents == | |
− | + | ||
− | + | ||
IS-InformationCollector is capable to index and make querable: | IS-InformationCollector is capable to index and make querable: | ||
Line 189: | Line 187: | ||
In this example, backups are done every 12 hours and stored under the $HOME/.gcore/... /existICBackups folder. 10 backups are maintained and when a new one is available, the oldest one is discarded. If an absolute path is indicated as ''backupDir'', the backups are not stored under the gCore persistent folder. | In this example, backups are done every 12 hours and stored under the $HOME/.gcore/... /existICBackups folder. 10 backups are maintained and when a new one is available, the oldest one is discarded. If an absolute path is indicated as ''backupDir'', the backups are not stored under the gCore persistent folder. | ||
− | == | + | == Design == |
The functionalities delivered by the service are logically organized across 5 port-types: | The functionalities delivered by the service are logically organized across 5 port-types: | ||
Line 201: | Line 199: | ||
− | + | === XMLCollectionAccess portType === | |
[TBD] | [TBD] | ||
− | + | === XQueryAccess portType === | |
+ | |||
===== Querying the data ===== | ===== Querying the data ===== | ||
The <code>XQueryExecute</code> operation exposed by the <em>XQueryAccess</em> portType to query the IS-Collector can be invoked as follows: | The <code>XQueryExecute</code> operation exposed by the <em>XQueryAccess</em> portType to query the IS-Collector can be invoked as follows: | ||
Line 305: | Line 304: | ||
... | ... | ||
− | + | === Sink portType === | |
− | + | ==== Registering a new resource ==== | |
A sample registration file: | A sample registration file: | ||
Revision as of 14:56, 12 April 2011
Role
The IS-InformationCollector is a gCube service in charge of aggregating the information published by the gCube services belonging an infrastructure or a subset of it (this depends on the infrastructure configuration).
Major features of the service are:
- storage, indexing, and management of gCube Resources profiles
- storage, indexing, and management of instances' states in the form of WS-ResourceProperties documents
- storage, indexing, and management of well-formed XML documents
- full XQuery 1.0 support over the collected information
- remote management of the underlying XMLStorage
It plays a crucial role on an infrastructure, since it provides to clients a continuously updated picture of the infrastructure and its state. Responsiveness is also a key point of the service: many queries are sent to the InformationCollector during online operations and any delay introduces by the query processing is immediately reflected on any aspect of the system.
Historically, the service was conceived as an aggregator service, able to create Aggregator Sinks that query remote Aggregator Sources (in particular QueryAggregatorSources, as those created by the IS-Publisher) to harvest resource properties.
Version 3.0 features a new WS-DAIX compliant interface proving a more general approach to the feeding phase.
Documents
IS-InformationCollector is capable to index and make querable:
- gCube Resources
- representations of a piece of state of a gCube service, in the form of WS-ResourceProperties document
- any WS-DAIX compliant resource
In the first two classes of resources (gCube Resource and WS-ResourceProperties), the to-be-stored documents has to come along with a metadata record reporting information about the publisher. Then, the source document is wrapped with this information and indexed.
For instance, if a client wants to store the following document as gCube Resource:
<Resource xmlns="" version="0.4.x"> <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID> <Type>MetadataCollection</Type> <Scopes> <Scope>/d4science.research-infrastructures.eu/FARM/FCPPS</Scope> </Scopes> <Profile> <Description/> <Name>Annotations_Collection</Name> <IsUserCollection value="true"/> <IsIndexable value="true"/> <IsEditable value="true"/> <CreationTime>2010-07-21T15:14:13+01:00</CreationTime> <Creator>unknown</Creator> <NumberOfMembers>0</NumberOfMembers> <LastUpdateTime>2010-07-21T15:14:29+01:00</LastUpdateTime> <PreviousUpdateTime>2010-07-21T15:14:29+01:00</PreviousUpdateTime> <LastModifier>unknown</LastModifier> <OID>5ec18cb0-94d2-11df-b077-8e28f9a7444e</OID> <RelatedCollection> <CollectionID>d14baeb0-f162-11dd-96f7-b87cc0f0b075</CollectionID> <SecondaryRole>is-annotated-by</SecondaryRole> </RelatedCollection> <MetadataFormat> <SchemaURI>http://gcube.org/AnnotationFrontEnd/AFEAnnotationV2</SchemaURI> <Language>en</Language> <Name>Annotations</Name> </MetadataFormat> </Profile> </Resource>
it must send also the following metadata record related to the resource:
<?xml version="1.0" encoding="UTF-8"?> <Metadata> <Type>Profile</Type> <Source>http://source</Source> <TerminationTime>600</TerminationTime> <GroupKey>MyGroupKey</GroupKey> <EntryKey>MyEntryKey</EntryKey> <Key>MyKey</Key> </Metadata>
Once the Collector receives both the documents (via the XMLCollectionAccess portType), it produces and indexes the following integrated document:
<?xml version="1.0" encoding="UTF-8"?> <Document> <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID> <Source>http://source</Source> <SourceKey>MyKey</SourceKey> <CompleteSourceKey/> <EntryKey>MyEntryKey</EntryKey> <GroupKey>MyGroupKey</GroupKey> <TerminationTime>1288103198680</TerminationTime> <TerminationTimeHuman>Tue Oct 26 15:26:38 GMT+01:00 2010</TerminationTimeHuman> <LastUpdateMs>1288102598680</LastUpdateMs> <LastUpdateHuman>Tue Oct 26 15:16:38 GMT+01:00 2010</LastUpdateHuman> <Data> <Resource xmlns="" version="0.4.x"> <ID>3bb6e850-94d2-11df-8d06-8e825c7c7b8d</ID> <Type>MetadataCollection</Type> <Scopes> <Scope>/d4science.research-infrastructures.eu/FARM/FCPPS</Scope> </Scopes> <Profile> <Description/> <Name>Annotations_Collection</Name> <IsUserCollection value="true"/> <IsIndexable value="true"/> <IsEditable value="true"/> <CreationTime>2010-07-21T15:14:13+01:00</CreationTime> <Creator>unknown</Creator> <NumberOfMembers>0</NumberOfMembers> <LastUpdateTime>2010-07-21T15:14:29+01:00</LastUpdateTime> <PreviousUpdateTime>2010-07-21T15:14:29+01:00</PreviousUpdateTime> <LastModifier>unknown</LastModifier> <OID>5ec18cb0-94d2-11df-b077-8e28f9a7444e</OID> <RelatedCollection> <CollectionID>d14baeb0-f162-11dd-96f7-b87cc0f0b075</CollectionID> <SecondaryRole>is-annotated-by</SecondaryRole> </RelatedCollection> <MetadataFormat> <SchemaURI>http://gcube.org/AnnotationFrontEnd/AFEAnnotationV2</SchemaURI> <Language>en</Language> <Name>Annotations</Name> </MetadataFormat> </Profile> </Resource> </Data> </Document>
This allows to:
- have more enriched queries based also on the publisher's data
- manage the termination time of the resource by relying on the time machine of the node hosting the Collector instance (the Termination Time in seconds is added to the last update time in order to obtain an absolute expiration time of the document), by avoiding problems due to different timezones in the infrastructure.
About the target collection in which the document will be stored:
- the collection is automatically derived (and created, if needed) starting from the document type and the information included in the document itself
- the collection name reported in the invocation to the AddDocuments operation is therefore ignored
About the metadata record, do note:
- the
Type
may assume the following two values:
-
Profile
for a gCube Resource -
InstanceState
for a piece of state of a gCube service instance
-
- the
TerminationTime
's value must be expressed in seconds and it will be computed by the time the Collector receives the document
If a document is not stored with an associated metadata record, it is treated as a WS-DAIX resource.
Generic DAIX resources
[TBC]
XML Indexing
The IS-InformationCollector uses an embedded instance of eXist 1.2 to index XML data according to the XML data model and offers efficient, index-based XQuery processing. The documents are stored in collections following this structure:
gcube:// | |- Profiles | |-<type1> | |-<type2> | |- .. | |-<typeN> | |- Properties | |- <User defined collection(s)> | |- <User defined sub-collection(s)>
Profiles and Properties collections are reserved to store respectively gCube resource's profiles and instance's states. Under the Profiles collection, a sub-collection with the resource type name is created whenever a new type is detected in a to-be-stored profile. Moreover, a client may define his own structure of collection and sub-collections and store, index and query there his documents via the XMLCollectionAccess port-type. This hierarchy of collections and sub-collections must be kept in mind when constructing a query expression to execute via the XMLQueryAccess.
Periodic exports (as zipped archives) of the XML database content are performed for backup purposes. The behavior of this activity can be configured in the JNDI file. The following section of the JNDI file shows the parameters that may be used to define where, when and how many backups are managed:
<service name="gcube/informationsystem/collector"> <!-- ... --> <environment name="backupDir" value="existICBackups" type="java.lang.String" override="false" /> <environment name="maxBackups" value="10" type="java.lang.String" override="false" /> <environment name="scheduledBackupInHours" value="12" type="java.lang.String" override="false" /> </service>
In this example, backups are done every 12 hours and stored under the $HOME/.gcore/... /existICBackups folder. 10 backups are maintained and when a new one is available, the oldest one is discarded. If an absolute path is indicated as backupDir, the backups are not stored under the gCore persistent folder.
Design
The functionalities delivered by the service are logically organized across 5 port-types:
where:
- XMLCollectionAccess, Sink and SinkEntry are dedicated to the publishing phase
- XMLQueryAccess allows to execute XQuery expression over the instance's content
- XMLStorageAcess is used to remotely manage the XML Storage underlying the service instance
XMLCollectionAccess portType
[TBD]
XQueryAccess portType
Querying the data
The XQueryExecute
operation exposed by the XQueryAccess portType to query the IS-Collector can be invoked as follows:
package org.gcube.informationsystem.collector.testsuite; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.net.URL; import java.rmi.RemoteException; import org.gcube.common.core.contexts.GCUBERemotePortTypeContext; import org.gcube.common.core.scope.GCUBEScope; import org.gcube.common.core.utils.logging.GCUBEClientLog; import org.gcube.informationsystem.collector.stubs.XQueryAccessPortType; import org.gcube.informationsystem.collector.stubs.XQueryExecuteRequest; import org.gcube.informationsystem.collector.stubs.XQueryExecuteResponse; import org.gcube.informationsystem.collector.stubs.XQueryFaultType; import org.gcube.informationsystem.collector.stubs.service.XQueryAccessServiceLocator; /** * Tester for <em>XQueryExecute</em> operation of the <em>gcube/informationsystem/collector/XQueryAccess</em> portType * * @author Manuele Simi (ISTI-CNR) * */ public class XQueryExecuteTester { private static GCUBEClientLog logger = new GCUBEClientLog(XQueryExecuteTester.class); /** * @param args * <ul> * <li> IC host * <li> IC port * <li> Caller Scope * <li> File including the XQuery to submit * </ul> */ public static void main(String[] args) { if (args.length != 4) { logger.fatal("Usage: XQueryExecuteTester <host> <port> <Scope> <XQueryExpressionFile>" ); return; } String portTypeURI = "http://"+args[0]+":"+ args[1]+"/wsrf/services/gcube/informationsystem/collector/XQueryAccess"; XQueryAccessPortType port = null; try { port = new XQueryAccessServiceLocator().getXQueryAccessPortTypePort(new URL(portTypeURI)); port = GCUBERemotePortTypeContext.getProxy(port, GCUBEScope.getScope(args[2])); } catch (Exception e) { logger.error(e); } XQueryExecuteRequest request = new XQueryExecuteRequest(); request.setXQueryExpression(readQuery(args[3])); try { logger.info("Submitting query in scope " + GCUBEScope.getScope(args[2]).getName() + "...."); XQueryExecuteResponse response = port.XQueryExecute(request); logger.info("Number of returned records: " + response.getSize()); logger.info("Dataset: \n" + response.getDataset()); } catch (XQueryFaultType e) { logger.error("XQuery Fault Error received", e); } catch (RemoteException e) { logger.error(e); } } /** reads the input fileaname*/ private static String readQuery(final String filename) { String queryString = null; try { BufferedReader input = new BufferedReader(new FileReader(filename)); StringBuilder contents = new StringBuilder(); String line; while (( line = input.readLine()) != null){ contents.append(line); contents.append(System.getProperty("line.separator")); } input.close(); queryString = contents.toString(); } catch (FileNotFoundException e1) { logger.fatal("invalid file: " + filename); } catch (IOException e) { logger.fatal("an error occurred when reading " + filename); } return queryString; } }
Sample queries
...
Sink portType
Registering a new resource
A sample registration file:
<ServiceGroupRegistrationParameters xmlns:sgc="http://mds.globus.org/servicegroup/client" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing" xmlns:agg="http://mds.globus.org/aggregator/types" xmlns="http://mds.globus.org/servicegroup/client"> <!-- Specifies that the registration will be renewed every 60 seconds --> <RefreshIntervalSecs>60</RefreshIntervalSecs> <!-- <Content> specifies registration specific information --> <Content xsi:type="agg:AggregatorContent" xmlns:agg="http://mds.globus.org/aggregator/types"> <agg:AggregatorConfig> <agg:GetMultipleResourcePropertiesPollType xmlns:metadatamanager="http://gcube-system.org/namespaces/metadatamanagement/metadatamanager"> <!-- Polling interval --> <agg:PollIntervalMillis>60000</agg:PollIntervalMillis> <!-- Resource names--> <agg:ResourcePropertyNames>metadatamanager:MetadataCollectionID</agg:ResourcePropertyNames> <agg:ResourcePropertyNames>metadatamanager:LastUpdateTime</agg:ResourcePropertyNames> </agg:GetMultipleResourcePropertiesPollType> </agg:AggregatorConfig> <agg:AggregatorData/> </Content> </ServiceGroupRegistrationParameters>
A sample Registration class:
import org.globus.wsrf.encoding.ObjectDeserializer; import org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClient; import org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClientCallback; import org.globus.wsrf.utils.XmlUtils; import org.globus.mds.servicegroup.client.ServiceGroupRegistrationParameters; import org.apache.axis.message.addressing.EndpointReferenceType; import commonj.timers.Timer; /** * Registration class for Aggregator Resources * * @author Manuele Simi (ISTI-CNR) * */ public class Registration implements ServiceGroupRegistrationClientCallback { private final EndpointReferenceType source; // the SOURCE endpoint private final EndpointReferenceType sink; // the SINK endpoint, e.g. http://dlib27.isti.cnr.it:8000/wsrf/services/gcube/informationsystem/collector/Sink private String registrationFile = "...";// the content of the registration file public void register() { byte[] bArray = this.registrationFile.getBytes(); ByteArrayInputStream bais = new ByteArrayInputStream(bArray); Document doc = XmlUtils.newDocument(bais); // setting parameters for this registration ServiceGroupRegistrationParameters params = (ServiceGroupRegistrationParameters) ObjectDeserializer.toObject(doc.getDocumentElement(), ServiceGroupRegistrationParameters.class); params.setRegistrantEPR(this.source); params.setServiceGroupEPR(this.sink); // create a serviceGroupRegistration Client for registration ServiceGroupRegistrationClient client = new ServiceGroupRegistrationClient(); // subscribe to receive registration status messages client.setClientCallback(this); /* * The client creates a SinkEntry in the appropriate target WS-ServiceGroup. Then, it periodically attempts to renew WS-ResourceLifetime lifetime extension on the * SinkEntry. If the client detects that the SinkEntry no longer available, it will create a new one. */ Timer timer = client.register(params, 1000); } /** * {@inheritDoc} * @see org.globus.wsrf.impl.servicegroup.client.ServiceGroupRegistrationClientCallback#setRegistrationStatus( * org.globus.mds.servicegroup.client.ServiceGroupRegistrationParameters, boolean, boolean,java.lang.Exception) */ public boolean setRegistrationStatus(ServiceGroupRegistrationParameters regParams, boolean wasSuccessful, boolean wasRenewalAttempt, Exception exception) { if (wasSuccessful) { if (wasRenewalAttempt) { logger.trace("Renewal of the existing registration completed successfully"); } else { logger.trace("New registration completed successfully"); } } else { logger.error("Registration failed"); } } }
Sample Usage
TBP
Test-suite
The IS-Collector comes with a test-suite package allowing to test its administration functionalities (mainly the XMLStorageAccess portType).The test-suite is completely independent and does not require any other gCube package, except than a local gCore installation. The package is composed by a set of classes, sample configuration files and scripts ready to be executed.
|-lib |--org.gcube.informationsystem.collector.testsuite.jar | |-backup.sh |-restore.sh |-shutdown.sh |-connect.sh
Each script allows to test a different service's operation or group of operations logically related. In the following, an explanation of each script and its usage is provided.
Storing new Documents
Storing a new gCube Document
Invoke the AddDocument.sh tester with the following arguments:
addDocument.sh <host> <port> <scope> <ProfileID> <filename> Profile
Storing a new Instance State Document
Invoke the addDocument.sh tester with the following arguments:
addDocument.sh <host> <port> <scope> <StateID> <filename> InstanceState
Storing a new DAIX Resource Document
Invoke the AddDocument.sh tester with the following arguments:
addDAIXDocument.sh <host> <port> <scope> <ProfileID> <filename> <targetCollectionName>
Retrieving Documents
Getting a gCube Document
Invoke the getDocument.sh tester with the following arguments:
getDocumentsTester.sh <host> <port> <scope> gcube://Profiles/<ProfileType> <ProfileID>
Getting an Instance State Document
Invoke the getDocument.sh tester with the following arguments:
getDocument.sh <host> <port> <scope> gcube://Properties <StateID>
Getting an DAIX Resource Document
Invoke the getDocument.sh tester with the following arguments:
getDocument.sh <host> <port> <scope> gcube://<collectionName> <ResourceID>
Managing the XML Storage
Requesting the XML Storage backup
The backup.sh script invokes the Backup operation of the XMLStorageAccess portType and requests an immediate backup of the XML Storage. It has to be executed as follows:
./backup.sh <IS-Collector host> <IS-Collector port> <scope>
Restoring the latest XML Storage backup
The restore.sh script invokes the Restore operation of the XMLStorageAccess portType and requests the restore of the latest backup of the XML Storage available. It has to be executed as follows:
./restore.sh <IS-Collector host> <IS-Collector port> <scope>
Shutting down the XML Storage
The shutdown.sh script invokes the Shutdown operation of the XMLStorageAccess portType and requests the shutdown of any connection with the XML Storage available. This could be helpful to call before to stop the container on the node in order to have a gently shutdown of the instance. It has to be executed as follows:
./shutdown.sh <IS-Collector host> <IS-Collector port> <scope>
Reconnecting the XML Storage
The connect.sh script invokes the Connect operation of the XMLStorageAccess portType and requests to reopen the needed connections to the XML Storage previously closed with a Shutdown call. It has to be executed as follows:
./coonect.sh <IS-Collector host> <IS-Collector port> <scope>