Difference between revisions of "OAITMPlugin"
From Gcube Wiki
m (→Conceptual Schema) |
m (→Conceptual Schema) |
||
Line 52: | Line 52: | ||
** ''contentType'': i.e. whether main or alternative content; | ** ''contentType'': i.e. whether main or alternative content; | ||
** ''mimeType'': MIME type of the actual content; | ** ''mimeType'': MIME type of the actual content; | ||
− | |||
** ''url'': URL to the actual content; | ** ''url'': URL to the actual content; | ||
Revision as of 15:23, 8 April 2013
OAI TM Plugin is a plugin that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.
Plugin parameters fields
Retrieving data on OAI TM Plugin requires a harvester (user) to apply one of the two request types to obtain data from OAI repositories:
- WrapSetsRequest: to create a collection for each set of the external repository.
- In this case the mandatory information you have to provide are the following:
- base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
- contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
- default metadata format (e.g. oai_dc)
- There are also other non mandatory information we suggest to provide:
- title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
- alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
- repository name: self explaining field (e.g. aquacomm)
- setIdentifierList: the id of the set to take into consideration
- In this case the mandatory information you have to provide are the following:
- WrapRepositoryRequest: it has to be used to create a collection containing the entire data available in the external repository.
- In this case the mandatory information you have to provide are the following:
- base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
- contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
- default metadata format (e.g. oai_dc)
- There are also other non mandatory information we suggest to provide:
- title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
- alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
- repository name: self explaining field (e.g. aquacomm)
- collection description: a description for a collection
- In this case the mandatory information you have to provide are the following:
Tree model
Conceptual Schema
Each collection item produced by this plugin is characterised by the following information:
- item metadata: global information on the item including
- title: the title of the record;
- collectionID: the collection this item belongs to;
- creationTime: the time the item was created;
- lastUpdateTime: the most recent time the item has been updated;
- provenance: It is characterised by the following information:
- statement: "This item has been created by "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
- setID: the repository set the object belongs to (optional and repeatable);
- metadata (repeatable): the metadata record harvested. It is characterised by the following information:
- schema: the metadata format of the metadata record;
- schemaLocation: the metadata format schema URI;
- record: the manifestation of the metadata record harvested;
- content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
- contentType: i.e. whether main or alternative content;
- mimeType: MIME type of the actual content;
- url: URL to the actual content;
XML Schema
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" > <!-- XML Schema Generated from XML Document on Fri Apr 05 2013 16:24:22 GMT+0200 (CEST) --> <!-- with XmlGrid.net Free Online Service http://xmlgrid.net --> <xs:element name="t:root" > <xs:complexType> <xs:sequence> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> <xs:element name="isDescribedBy" > <xs:complexType> <xs:sequence> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="language" type="xs:string" /> <xs:element name="schemaName" type="xs:string" /> <xs:element name="schemaURI" type="xs:string" /> <xs:element name="bytestream" type="xs:string" /> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> </xs:sequence> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="url" type="xs:string" /> <xs:element name="name" type="xs:string" /> <xs:element name="hasAlternative" maxOccurs="unbounded" > <xs:complexType> <xs:sequence> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="name" type="xs:string" /> <xs:element name="url" type="xs:string" /> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> </xs:sequence> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="xmlns:t" type="xs:string" /> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> </xs:schema>
Example
A tree generated by OAI TM Plugin looks like this:
<?xml version="1.0" ?> <t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:dspace.mit.edu:1721.1/27225"> <creationTime>2008-03-10T16:34:16.000+01:00</creationTime> <lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime> <isDescribedBy t:id="oai:dspace.mit.edu:1721.1/27225-oai_dc"> <mimeType>text/xml</mimeType><length>2139</length> <language>unknown</language> <schemaName>oai_dc</schemaName> <schemaURI>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaURI> <bytestream>[B@60de1b8a</bytestream> <creationTime>2008-03-10T16:34:16.000+01:00</creationTime> <lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime> </isDescribedBy> <mimeType>text/url</mimeType> <length>34</length> <url>http://hdl.handle.net/1721.1/27225</url> <name>A methodology for the assessment of the proliferation resistance of nuclear power systems: topical report</name> <hasAlternative t:id="0-alternative-0"> <mimeType>text/xml;charset=UTF-8</mimeType> <length>406</length> <name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name> <url>http://hdl.handle.net/1721.1/27225</url> <creationTime>2008-03-10T16:34:16.000+01:00</creationTime> <lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime> </hasAlternative> <hasAlternative t:id="0-alternative-1"> <mimeType>text/xml;charset=UTF-8</mimeType> <length>406</length> <name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name> <url>http://hdl.handle.net/1721.1/27225</url> <creationTime>2008-03-10T16:34:16.000+01:00</creationTime> <lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime> </hasAlternative> <hasAlternative t:id="0-alternative-2"> <mimeType>text/xml;charset=UTF-8</mimeType> <length>406</length> <name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name> <url>http://hdl.handle.net/1721.1/27225</url> <creationTime>2008-03-10T16:34:16.000+01:00</creationTime> <lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime> </hasAlternative> </t:root>
Maven coordinates
The Maven coordinates of oai-tree-plugin of its development versions are:
<dependency> <groupId>org.gcube.data.oai.tmplugin</groupId> <artifactId>oai-tm-plugin</artifactId> <version>1.1.0-2.13.0</version> </dependency>