Difference between revisions of "OAITMPlugin"

From Gcube Wiki
Jump to: navigation, search
m
(Example)
Line 95: Line 95:
 
<?xml version="1.0" ?>
 
<?xml version="1.0" ?>
  
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:23">
+
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:16">
 
+
<title>Southern California partyboat angler survey</title>
<title>Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978</title>
+
 
<collectionID>7375626A656374733D54</collectionID>
 
<collectionID>7375626A656374733D54</collectionID>
<creationTime>2011-09-29T22:42:01.000+02:00</creationTime>
+
<creationTime>2011-09-29T22:41:21.000+02:00</creationTime>
<lastUpdateTime>2011-09-29T22:42:01.000+02:00</lastUpdateTime>
+
<lastUpdateTime>2011-09-29T22:41:21.000+02:00</lastUpdateTime>
  
 
<provenance>
 
<provenance>
 
<statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement>
 
<statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement>
 
<setID>7375626A656374733D54</setID>
 
<setID>7375626A656374733D54</setID>
 +
<recordID>oai:generic.eprints.org:16</recordID>
 
</provenance>
 
</provenance>
  
Line 111: Line 111:
 
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
 
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
 
<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
 
<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
&lt;dc:title&gt;Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978&lt;/dc:title&gt;
+
&lt;dc:title&gt;Southern California partyboat angler survey&lt;/dc:title&gt;
&lt;dc:creator&gt;Gotshall, Daniel W.&lt;/dc:creator&gt;
+
&lt;dc:creator&gt;Hartmann, A. Rucker&lt;/dc:creator&gt;
&lt;dc:creator&gt;Laurent, Laurence L.&lt;/dc:creator&gt;
+
&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
&lt;dc:creator&gt;Grant, John J.&lt;/dc:creator&gt;
+
&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
&lt;dc:subject&gt;Ecology&lt;/dc:subject&gt;
+
&lt;dc:description&gt;Previous studies suggest that ocean anglers are unable&amp;#13;
&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
+
to identify many common marine fishes and that they frequently use nondesignated common names for those fishes&amp;#13;
&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
+
with which they are familiar.&amp;#13;
&lt;dc:description&gt;Although we continue to monitor permanent stations&amp;#13;
+
This paper discusses the ability of the anglers and&amp;#13;
on a regular basis, we have suspended our 30-m2&amp;#13;
+
crew aboard commercial passenger fishing vessels (CPFV) to&amp;#13;
random subtidal and 1/4-m2 random intertidal studies&amp;#13;
+
identify 22 fishes caught off southern California and &amp;#13;
during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&amp;#13;
+
relates this ability to fishing experience and frequency. Implications to resource management are also discussed.&amp;#13;
Giant red sea urchin, Strongylocentrotus franciscanus,&amp;#13;
+
Most CPFV anglers were inexperienced and could identify&amp;#13;
numbers continue to decline at their last "stronghold"&amp;#13;
+
only a few of the species. However, as experience increased,&amp;#13;
in our subtidal study area, permanent station&amp;#13;
+
the scores improved. Vessel crew members scored higher than&amp;#13;
15. The recruitment of juvenile blue rockfish,&amp;#13;
+
the most experienced anglers.&amp;#13;
Sebastes mystinus, appears to be either late or&amp;#13;
+
The inability of anglers to identify marine fishes and&amp;#13;
low this year in our study areas. The most abundant&amp;#13;
+
the widespread use of nondesignated and often confusing&amp;#13;
fish, so far, from the new method of assessment,&amp;#13;
+
common names help to explain why some fishery management&amp;#13;
are adult blue rockfish, kelp greenling,&amp;#13;
+
regulations of the California Department of Fish and Game&amp;#13;
Hexagrammos decagrammus, and gopher rockfish,&amp;#13;
+
are relatively ineffective.  (37pp.)&lt;/dc:description&gt;
Sebastes carnatus.&amp;#13;
+
&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
Various trends of abalone abundance at the permanent&amp;#13;
+
&lt;dc:date&gt;1980&lt;/dc:date&gt;
intertidal stations, increasing at some,&amp;#13;
+
&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
decreasing at others, were observed during this&amp;#13;
+
&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
quarter.&amp;#13;
+
&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
Sea otters, Enhydra lutris, seem to have reached&amp;#13;
+
&lt;dc:identifier&gt;http://aquaticcommons.org/16/1/Marine_Resources_Administrative_Report_No._80%2D7.pdf&lt;/dc:identifier&gt;
their annual springtime peak in abundance during&amp;#13;
+
&lt;dc:identifier&gt;Hartmann, A. Rucker (1980) Southern California partyboat angler survey. Long Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 80-7)&lt;/dc:identifier&gt;
April and May. Several otters were seen rafting&amp;#13;
+
&lt;dc:relation&gt;http://aquaticcommons.org/16/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
and foraging around and near the intake cove&amp;#13;
+
breakwaters, apparently becoming emboldened to&amp;#13;
+
human presence.  (18pp.)&lt;/dc:description&gt;
+
&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
+
&lt;dc:date&gt;1979&lt;/dc:date&gt;
+
&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
+
&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
+
&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
+
&lt;dc:identifier&gt;http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf&lt;/dc:identifier&gt;
+
&lt;dc:identifier&gt;Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)&lt;/dc:identifier&gt;
+
&lt;dc:relation&gt;http://aquaticcommons.org/23/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
+
 
</metadata>
 
</metadata>
  
Line 155: Line 144:
 
<contentType>main</contentType>
 
<contentType>main</contentType>
 
<mimeType>text/url</mimeType>
 
<mimeType>text/url</mimeType>
<url>http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf</url>
+
<url>http://aquaticcommons.org/16/1/Marine_Resources_Administrative_Report_No._80%2D7.pdf</url>
 
</content>
 
</content>
  
Line 161: Line 150:
 
<contentType>alternative</contentType>
 
<contentType>alternative</contentType>
 
<mimeType>text/html; charset=UTF-8</mimeType>
 
<mimeType>text/html; charset=UTF-8</mimeType>
<url>http://aquaticcommons.org/23/</url>
+
<url>http://aquaticcommons.org/16/</url>
 
</content>
 
</content>
  

Revision as of 13:58, 9 April 2013

OAI TM Plugin is a plugin of the Tree Based Access Facilities that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.

Plugin parameters fields

Plugins lead to the creation of one or more collections. Thus, in addition to the information below, the user should specify collection name and description.

In order to instruct a Plugin on how to perform the harvesting, a user should specify the following mandatory information:

  • repository name: the name of the repository to be harvested, e.g. "aquacomm";
  • base URL: the base URL of the repository, e.g. "http://aquacomm.fcla.edu/cgi/oai2";
  • default metadata format: the metadata format to be used for harvesting, e.g. "oai_dc";
  • title XPath: the expression for identifying the title of the harvested resource, e.g. "//*[local-name()='title']";

In addition to that, the user might specify the following information:

  • content XPath: the expression for identifying the content of the harvested resource, e.g. "//*[local-name()='identifier' and contains(.,'://')]";
  • alternatives XPath: the expression for identifying additional content of the harvested resource, e.g. "//*[local-name()='relation' and contains(.,'://')]";
  • set Identifiers List: the list of id of the sets to take into consideration during the harvesting phase;

Two typologies of plugins have been defined:

  • WrapSetsRequest: to create a collection for each set of the external repository or for each set specified in the setIdentifierList;
  • WrapRepositoryRequest: to create a single collection containing the whole content of the repository or the content of the sets specified in the setIdentifierList;

Tree model

Conceptual Schema

Each collection item produced by this plugin is characterised by the following information:

  • item metadata: global information on the item including
    • title: the title of the record;
    • collectionID: the collection this item belongs to;
    • creationTime: the time the item was created;
    • lastUpdateTime: the most recent time the item has been updated;
    • provenance: It is characterised by the following information:
      • statement: "This item has been created by the gCube "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
      • setID: the repository set the object belongs to (optional and repeatable);
      • recordID: the identifier of the metadata record;
  • metadata (repeatable): the metadata record harvested. It is characterised by the following information:
    • schema: the metadata format of the metadata record;
    • schemaLocation: the metadata format schema URI;
    • record: the manifestation of the metadata record harvested;
  • content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
    • contentType: i.e. whether main or alternative content;
    • mimeType: MIME type of the actual content;
    • url: URL to the actual content;

XML Schema

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="title" type="xs:string"/>
  <xs:element name="collectionID" type="xs:string"/>
  <xs:element name="creationTime" type="xs:dateTime"/>
  <xs:element name="lastUpdateTime" type="xs:dateTime"/>
  <xs:element name="provenance">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="statement"/>
        <xs:element type="xs:string" name="setID"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="metadata">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="schema"/>
        <xs:element type="xs:anyURI" name="schemaLocation"/>
        <xs:element type="xs:string" name="record"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="content">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="contentType"/>
        <xs:element type="xs:string" name="mimeType"/>
        <xs:element type="xs:anyURI" name="url"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Example

A tree generated by OAI TM Plugin looks like this:

<?xml version="1.0" ?>
 
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:16">
	<title>Southern California partyboat angler survey</title>
	<collectionID>7375626A656374733D54</collectionID>
	<creationTime>2011-09-29T22:41:21.000+02:00</creationTime>
	<lastUpdateTime>2011-09-29T22:41:21.000+02:00</lastUpdateTime>
 
	<provenance>
		<statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement>
		<setID>7375626A656374733D54</setID>
		<recordID>oai:generic.eprints.org:16</recordID>
	</provenance>
 
	<metadata>
		<schema>oai_dc</schema>
		<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
		<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
		&lt;dc:title&gt;Southern California partyboat angler survey&lt;/dc:title&gt;
		&lt;dc:creator&gt;Hartmann, A. Rucker&lt;/dc:creator&gt;
		&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
		&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
		&lt;dc:description&gt;Previous studies suggest that ocean anglers are unable&amp;#13;
		to identify many common marine fishes and that they frequently use nondesignated common names for those fishes&amp;#13;
		with which they are familiar.&amp;#13;
		This paper discusses the ability of the anglers and&amp;#13;
		crew aboard commercial passenger fishing vessels (CPFV) to&amp;#13;
		identify 22 fishes caught off southern California and &amp;#13;
		relates this ability to fishing experience and frequency. Implications to resource management are also discussed.&amp;#13;
		Most CPFV anglers were inexperienced and could identify&amp;#13;
		only a few of the species. However, as experience increased,&amp;#13;
		the scores improved. Vessel crew members scored higher than&amp;#13;
		the most experienced anglers.&amp;#13;
		The inability of anglers to identify marine fishes and&amp;#13;
		the widespread use of nondesignated and often confusing&amp;#13;
		common names help to explain why some fishery management&amp;#13;
		regulations of the California Department of Fish and Game&amp;#13;
		are relatively ineffective.  (37pp.)&lt;/dc:description&gt;
		&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
		&lt;dc:date&gt;1980&lt;/dc:date&gt;
		&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
		&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
		&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
		&lt;dc:identifier&gt;http://aquaticcommons.org/16/1/Marine_Resources_Administrative_Report_No._80%2D7.pdf&lt;/dc:identifier&gt;
		&lt;dc:identifier&gt;Hartmann, A. Rucker (1980) Southern California partyboat angler survey. Long Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 80-7)&lt;/dc:identifier&gt;
		&lt;dc:relation&gt;http://aquaticcommons.org/16/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
	</metadata>
 
	<content>
		<contentType>main</contentType>
		<mimeType>text/url</mimeType>
		<url>http://aquaticcommons.org/16/1/Marine_Resources_Administrative_Report_No._80%2D7.pdf</url>
	</content>
 
	<content>
		<contentType>alternative</contentType>
		<mimeType>text/html; charset=UTF-8</mimeType>
		<url>http://aquaticcommons.org/16/</url>
	</content>
 
</t:root>

Maven coordinates

The Maven coordinates of oai-tree-plugin of its development versions are:

<dependency>
  <groupId>org.gcube.data.oai.tmplugin</groupId>
  <artifactId>oai-tm-plugin</artifactId>
  <version>1.1.0-2.13.0</version>
</dependency>