OAITMPlugin

From Gcube Wiki
Revision as of 15:19, 8 April 2013 by Leonardo.candela (Talk | contribs) (Conceptual Schema)

Jump to: navigation, search

OAI TM Plugin is a plugin that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.

Plugin parameters fields

Retrieving data on OAI TM Plugin requires a harvester (user) to apply one of the two request types to obtain data from OAI repositories:

  • WrapSetsRequest: to create a collection for each set of the external repository.
    • In this case the mandatory information you have to provide are the following:
    • There are also other non mandatory information we suggest to provide:
      • title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
      • alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
      • repository name: self explaining field (e.g. aquacomm)
      • setIdentifierList: the id of the set to take into consideration
  • WrapRepositoryRequest: it has to be used to create a collection containing the entire data available in the external repository.
    • In this case the mandatory information you have to provide are the following:
    • There are also other non mandatory information we suggest to provide:
      • title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
      • alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
      • repository name: self explaining field (e.g. aquacomm)
      • collection description: a description for a collection

Tree model

Conceptual Schema

Each collection item produced by this plugin is characterised by the following information:

  • item metadata: global information on the item including
    • title: the title of the record;
    • collectionID: the collection this item belongs to;
    • creationTime: the time the item was created;
    • lastUpdateTime: the most recent time the item has been updated;
    • provenance: It is characterised by the following information:
      • statement: "This item has been created by "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
      • setID: the repository set the object belongs to (optional and repeatable);
  • metadata (repeatable): the metadata record harvested. It is characterised by the following information:
    • schema: the metadata format of the metadata record;
    • schemaLocation: the metadata format schema URI;
    • record: the manifestation of the metadata record harvested;
  • content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
    • contentType: i.e. whether main or alternative content;
    • mimeType: MIME type of the actual content;
    • length: the size (in byte) of the content;
    • url: URL to the actual content;

XML Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" >
          <!-- XML Schema Generated from XML Document on Fri Apr 05 2013 16:24:22 GMT+0200 (CEST) -->
          <!-- with XmlGrid.net Free Online Service http://xmlgrid.net -->
       <xs:element name="t:root" >
              <xs:complexType>
                     <xs:sequence>
                            <xs:element name="creationTime" type="xs:string" />
                            <xs:element name="lastUpdateTime" type="xs:string" />
                            <xs:element name="isDescribedBy" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="mimeType" type="xs:string" />
                                                 <xs:element name="length" type="xs:int" />
                                                 <xs:element name="language" type="xs:string" />
                                                 <xs:element name="schemaName" type="xs:string" />
                                                 <xs:element name="schemaURI" type="xs:string" />
                                                 <xs:element name="bytestream" type="xs:string" />
                                                 <xs:element name="creationTime" type="xs:string" />
                                                 <xs:element name="lastUpdateTime" type="xs:string" />
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
                                      </xs:complexType>
                               </xs:element>
                            <xs:element name="mimeType" type="xs:string" />
                            <xs:element name="length" type="xs:int" />
                            <xs:element name="url" type="xs:string" />
                            <xs:element name="name" type="xs:string" />
                            <xs:element name="hasAlternative" maxOccurs="unbounded" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="mimeType" type="xs:string" />
                                                 <xs:element name="length" type="xs:int" />
                                                 <xs:element name="name" type="xs:string" />
                                                 <xs:element name="url" type="xs:string" />
                                                 <xs:element name="creationTime" type="xs:string" />
                                                 <xs:element name="lastUpdateTime" type="xs:string" />
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
                                      </xs:complexType>
                               </xs:element>
                        </xs:sequence>
                     <xs:attribute name="xmlns:t" type="xs:string" />
                     <xs:attribute name="t:id" type="xs:string" />
                 </xs:complexType>
          </xs:element>
   </xs:schema>

Example

A tree generated by OAI TM Plugin looks like this:

<?xml version="1.0" ?>
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:dspace.mit.edu:1721.1/27225">
<creationTime>2008-03-10T16:34:16.000+01:00</creationTime>
<lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime>
 
<isDescribedBy t:id="oai:dspace.mit.edu:1721.1/27225-oai_dc">
	<mimeType>text/xml</mimeType><length>2139</length>
	<language>unknown</language>
	<schemaName>oai_dc</schemaName>
	<schemaURI>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaURI>
	<bytestream>[B@60de1b8a</bytestream>
	<creationTime>2008-03-10T16:34:16.000+01:00</creationTime>
	<lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime>
</isDescribedBy>
 
<mimeType>text/url</mimeType>
<length>34</length>
<url>http://hdl.handle.net/1721.1/27225</url>
<name>A methodology for the assessment of the proliferation resistance of nuclear power systems: topical report</name>
 
<hasAlternative t:id="0-alternative-0">
	<mimeType>text/xml;charset=UTF-8</mimeType>
	<length>406</length>
	<name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name>
	<url>http://hdl.handle.net/1721.1/27225</url>
	<creationTime>2008-03-10T16:34:16.000+01:00</creationTime>
	<lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime>
</hasAlternative>
 
<hasAlternative t:id="0-alternative-1">
	<mimeType>text/xml;charset=UTF-8</mimeType>
	<length>406</length>
	<name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name>
	<url>http://hdl.handle.net/1721.1/27225</url>
	<creationTime>2008-03-10T16:34:16.000+01:00</creationTime>
	<lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime>
</hasAlternative>
 
<hasAlternative t:id="0-alternative-2">
	<mimeType>text/xml;charset=UTF-8</mimeType>
	<length>406</length>
	<name>A methodology for the assessment of the proliferation resistance of nuclear power systems</name>
	<url>http://hdl.handle.net/1721.1/27225</url>
	<creationTime>2008-03-10T16:34:16.000+01:00</creationTime>
	<lastUpdateTime>2008-03-10T16:34:16.000+01:00</lastUpdateTime>
</hasAlternative>
 
</t:root>

Maven coordinates

The Maven coordinates of oai-tree-plugin of its development versions are:

<dependency>
  <groupId>org.gcube.data.oai.tmplugin</groupId>
  <artifactId>oai-tm-plugin</artifactId>
  <version>1.1.0-2.13.0</version>
</dependency>