Difference between revisions of "OAITMPlugin"

From Gcube Wiki
Jump to: navigation, search
(Example)
(Example)
Line 128: Line 128:
  
 
<metadata>
 
<metadata>
<schema>oai_dc</schema>
+
<schema>oai_dc</schema>
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
+
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
+
<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
&lt;dc:title&gt;Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978&lt;/dc:title&gt;
+
&lt;dc:title&gt;Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978&lt;/dc:title&gt;
&lt;dc:creator&gt;Gotshall, Daniel W.&lt;/dc:creator&gt;
+
&lt;dc:creator&gt;Gotshall, Daniel W.&lt;/dc:creator&gt;
&lt;dc:creator&gt;Laurent, Laurence L.&lt;/dc:creator&gt;
+
&lt;dc:creator&gt;Laurent, Laurence L.&lt;/dc:creator&gt;
&lt;dc:creator&gt;Grant, John J.&lt;/dc:creator&gt;
+
&lt;dc:creator&gt;Grant, John J.&lt;/dc:creator&gt;
&lt;dc:subject&gt;Ecology&lt;/dc:subject&gt;
+
&lt;dc:subject&gt;Ecology&lt;/dc:subject&gt;
&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
+
&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
+
&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
&lt;dc:description&gt;Although we continue to monitor permanent stations&amp;#13;
+
&lt;dc:description&gt;Although we continue to monitor permanent stations&amp;#13;
on a regular basis, we have suspended our 30-m2&amp;#13;
+
on a regular basis, we have suspended our 30-m2&amp;#13;
random subtidal and 1/4-m2 random intertidal studies&amp;#13;
+
random subtidal and 1/4-m2 random intertidal studies&amp;#13;
during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&amp;#13;
+
during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&amp;#13;
Giant red sea urchin, Strongylocentrotus franciscanus,&amp;#13;
+
Giant red sea urchin, Strongylocentrotus franciscanus,&amp;#13;
numbers continue to decline at their last "stronghold"&amp;#13;
+
numbers continue to decline at their last "stronghold"&amp;#13;
in our subtidal study area, permanent station&amp;#13;
+
in our subtidal study area, permanent station&amp;#13;
15. The recruitment of juvenile blue rockfish,&amp;#13;
+
15. The recruitment of juvenile blue rockfish,&amp;#13;
Sebastes mystinus, appears to be either late or&amp;#13;
+
Sebastes mystinus, appears to be either late or&amp;#13;
low this year in our study areas. The most abundant&amp;#13;
+
low this year in our study areas. The most abundant&amp;#13;
fish, so far, from the new method of assessment,&amp;#13;
+
fish, so far, from the new method of assessment,&amp;#13;
are adult blue rockfish, kelp greenling,&amp;#13;
+
are adult blue rockfish, kelp greenling,&amp;#13;
Hexagrammos decagrammus, and gopher rockfish,&amp;#13;
+
Hexagrammos decagrammus, and gopher rockfish,&amp;#13;
Sebastes carnatus.&amp;#13;
+
Sebastes carnatus.&amp;#13;
Various trends of abalone abundance at the permanent&amp;#13;
+
Various trends of abalone abundance at the permanent&amp;#13;
intertidal stations, increasing at some,&amp;#13;
+
intertidal stations, increasing at some,&amp;#13;
decreasing at others, were observed during this&amp;#13;
+
decreasing at others, were observed during this&amp;#13;
quarter.&amp;#13;
+
quarter.&amp;#13;
Sea otters, Enhydra lutris, seem to have reached&amp;#13;
+
Sea otters, Enhydra lutris, seem to have reached&amp;#13;
their annual springtime peak in abundance during&amp;#13;
+
their annual springtime peak in abundance during&amp;#13;
April and May. Several otters were seen rafting&amp;#13;
+
April and May. Several otters were seen rafting&amp;#13;
and foraging around and near the intake cove&amp;#13;
+
and foraging around and near the intake cove&amp;#13;
breakwaters, apparently becoming emboldened to&amp;#13;
+
breakwaters, apparently becoming emboldened to&amp;#13;
human presence.  (18pp.)&lt;/dc:description&gt;
+
human presence.  (18pp.)&lt;/dc:description&gt;
&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
+
&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
&lt;dc:date&gt;1979&lt;/dc:date&gt;
+
&lt;dc:date&gt;1979&lt;/dc:date&gt;
&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
+
&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
+
&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
+
&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
&lt;dc:identifier&gt;http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf&lt;/dc:identifier&gt;
+
&lt;dc:identifier&gt;http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf&lt;/dc:identifier&gt;
&lt;dc:identifier&gt;Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)&lt;/dc:identifier&gt;
+
&lt;dc:identifier&gt;Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)&lt;/dc:identifier&gt;
&lt;dc:relation&gt;http://aquaticcommons.org/23/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
+
&lt;dc:relation&gt;http://aquaticcommons.org/23/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
 
</metadata>
 
</metadata>
  

Revision as of 12:11, 9 April 2013

OAI TM Plugin is a plugin that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.

Plugin parameters fields

Retrieving data on OAI TM Plugin requires a harvester (user) to apply one of the two request types to obtain data from OAI repositories:

  • WrapSetsRequest: to create a collection for each set of the external repository.
    • In this case the mandatory information you have to provide are the following:
    • There are also other non mandatory information we suggest to provide:
      • title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
      • alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
      • repository name: self explaining field (e.g. aquacomm)
      • setIdentifierList: the id of the set to take into consideration
  • WrapRepositoryRequest: it has to be used to create a collection containing the entire data available in the external repository.
    • In this case the mandatory information you have to provide are the following:
    • There are also other non mandatory information we suggest to provide:
      • title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
      • alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
      • repository name: self explaining field (e.g. aquacomm)
      • collection description: a description for a collection

Tree model

Conceptual Schema

Each collection item produced by this plugin is characterised by the following information:

  • item metadata: global information on the item including
    • title: the title of the record;
    • collectionID: the collection this item belongs to;
    • creationTime: the time the item was created;
    • lastUpdateTime: the most recent time the item has been updated;
    • provenance: It is characterised by the following information:
      • statement: "This item has been created by "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
      • setID: the repository set the object belongs to (optional and repeatable);
      • recordID: the identifier of the metadata record;
  • metadata (repeatable): the metadata record harvested. It is characterised by the following information:
    • schema: the metadata format of the metadata record;
    • schemaLocation: the metadata format schema URI;
    • record: the manifestation of the metadata record harvested;
  • content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
    • contentType: i.e. whether main or alternative content;
    • mimeType: MIME type of the actual content;
    • url: URL to the actual content;

XML Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" >
          <!-- XML Schema Generated from XML Document on Fri Apr 05 2013 16:24:22 GMT+0200 (CEST) -->
          <!-- with XmlGrid.net Free Online Service http://xmlgrid.net -->
       <xs:element name="t:root" >
              <xs:complexType>
                     <xs:sequence>
                            <xs:element name="creationTime" type="xs:string" />
                            <xs:element name="lastUpdateTime" type="xs:string" />
                            <xs:element name="isDescribedBy" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="mimeType" type="xs:string" />
                                                 <xs:element name="length" type="xs:int" />
                                                 <xs:element name="language" type="xs:string" />
                                                 <xs:element name="schemaName" type="xs:string" />
                                                 <xs:element name="schemaURI" type="xs:string" />
                                                 <xs:element name="bytestream" type="xs:string" />
                                                 <xs:element name="creationTime" type="xs:string" />
                                                 <xs:element name="lastUpdateTime" type="xs:string" />
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
                                      </xs:complexType>
                               </xs:element>
                            <xs:element name="mimeType" type="xs:string" />
                            <xs:element name="length" type="xs:int" />
                            <xs:element name="url" type="xs:string" />
                            <xs:element name="name" type="xs:string" />
                            <xs:element name="hasAlternative" maxOccurs="unbounded" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="mimeType" type="xs:string" />
                                                 <xs:element name="length" type="xs:int" />
                                                 <xs:element name="name" type="xs:string" />
                                                 <xs:element name="url" type="xs:string" />
                                                 <xs:element name="creationTime" type="xs:string" />
                                                 <xs:element name="lastUpdateTime" type="xs:string" />
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
                                      </xs:complexType>
                               </xs:element>
                        </xs:sequence>
                     <xs:attribute name="xmlns:t" type="xs:string" />
                     <xs:attribute name="t:id" type="xs:string" />
                 </xs:complexType>
          </xs:element>
   </xs:schema>

Example

A tree generated by OAI TM Plugin looks like this:

<?xml version="1.0" ?>
 
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:23">
 
	<title>Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978</title>
	<collectionID>7375626A656374733D54</collectionID>
	<creationTime>2011-09-29T22:42:01.000+02:00</creationTime>
	<lastUpdateTime>2011-09-29T22:42:01.000+02:00</lastUpdateTime>
	<provenance>
		<statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement>
		<setID>7375626A656374733D54</setID>
	</provenance>
 
	<metadata>
		<schema>oai_dc</schema>
		<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
		<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
			&lt;dc:title&gt;Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978&lt;/dc:title&gt;
			&lt;dc:creator&gt;Gotshall, Daniel W.&lt;/dc:creator&gt;
			&lt;dc:creator&gt;Laurent, Laurence L.&lt;/dc:creator&gt;
			&lt;dc:creator&gt;Grant, John J.&lt;/dc:creator&gt;
			&lt;dc:subject&gt;Ecology&lt;/dc:subject&gt;
			&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
			&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
			&lt;dc:description&gt;Although we continue to monitor permanent stations&amp;#13;
			on a regular basis, we have suspended our 30-m2&amp;#13;
			random subtidal and 1/4-m2 random intertidal studies&amp;#13;
			during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&amp;#13;
			Giant red sea urchin, Strongylocentrotus franciscanus,&amp;#13;
			numbers continue to decline at their last "stronghold"&amp;#13;
			in our subtidal study area, permanent station&amp;#13;
			15. The recruitment of juvenile blue rockfish,&amp;#13;
			Sebastes mystinus, appears to be either late or&amp;#13;
			low this year in our study areas. The most abundant&amp;#13;
			fish, so far, from the new method of assessment,&amp;#13;
			are adult blue rockfish, kelp greenling,&amp;#13;
			Hexagrammos decagrammus, and gopher rockfish,&amp;#13;
			Sebastes carnatus.&amp;#13;
			Various trends of abalone abundance at the permanent&amp;#13;
			intertidal stations, increasing at some,&amp;#13;
			decreasing at others, were observed during this&amp;#13;
			quarter.&amp;#13;
			Sea otters, Enhydra lutris, seem to have reached&amp;#13;
			their annual springtime peak in abundance during&amp;#13;
			April and May. Several otters were seen rafting&amp;#13;
			and foraging around and near the intake cove&amp;#13;
			breakwaters, apparently becoming emboldened to&amp;#13;
			human presence.  (18pp.)&lt;/dc:description&gt;
			&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
			&lt;dc:date&gt;1979&lt;/dc:date&gt;
			&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
			&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
			&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
			&lt;dc:identifier&gt;http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf&lt;/dc:identifier&gt;
			&lt;dc:identifier&gt;Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)&lt;/dc:identifier&gt;
			&lt;dc:relation&gt;http://aquaticcommons.org/23/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
	</metadata>
 
	<content>
		<contentType>main</contentType>
		<mimeType>text/url</mimeType>
		<url>http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf</url>
	</content>
 
	<content>
		<contentType>alternative</contentType>
		<mimeType>text/html; charset=UTF-8</mimeType>
		<url>http://aquaticcommons.org/23/</url>
	</content>
 
</t:root>

Maven coordinates

The Maven coordinates of oai-tree-plugin of its development versions are:

<dependency>
  <groupId>org.gcube.data.oai.tmplugin</groupId>
  <artifactId>oai-tm-plugin</artifactId>
  <version>1.1.0-2.13.0</version>
</dependency>