Difference between revisions of "OAITMPlugin"

From Gcube Wiki
Jump to: navigation, search
(Example)
(Example)
 
(42 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
||__TOC__
 
||__TOC__
 
|}
 
|}
OAI TM Plugin is a plugin that allows harvesting of metadata descriptions of the records in an archive, using [http://www.openarchives.org/pmh/ The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)].
+
OAI TM Plugin is a plugin of the [[Tree-Based_Access | Tree Based Access Facilities]] that allows harvesting of metadata descriptions of the records in an archive, using [http://www.openarchives.org/pmh/ The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)].
  
 
Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.
 
Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.
Line 8: Line 8:
 
==Plugin parameters fields==
 
==Plugin parameters fields==
  
Retrieving data on OAI TM Plugin requires a harvester (user) to apply one of the two request types to obtain data from OAI repositories:
+
Plugins lead to the creation of one or more collections. Thus, in addition to the information below, the user should specify collection name and description. 
  
* '''WrapSetsRequest''': to create a collection for each set of the external repository.
+
In order to instruct a Plugin on how to perform the harvesting, a user should specify the following mandatory information:
** In this case the mandatory information you have to provide are the following:
+
* '''''repository name''''': the name of the repository to be harvested, e.g. "aquacomm";
*** base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
+
* '''''base URL''''': the base URL of the repository, e.g. "http://aquacomm.fcla.edu/cgi/oai2";
*** contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
+
* '''''default metadata format''''': the metadata format to be used for harvesting, e.g. "oai_dc";
*** default metadata format (e.g. oai_dc)
+
* '''''title XPath''''': the expression for identifying the title of the harvested resource, e.g. "//*[local-name()='title']";
** There are also other non mandatory information we suggest to provide:
+
*** title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
+
*** alternatives Xpath:  xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
+
*** repository name: self explaining field (e.g. aquacomm)
+
*** setIdentifierList: the id of the set to take into consideration
+
  
* '''WrapRepositoryRequest''': it has to be used to create a collection containing the entire data available in the external repository.
+
In addition to that, the user might specify the following information:
** In this case the mandatory information you have to provide are the following:
+
* '''''content XPath''''': the expression for identifying the content of the harvested resource, e.g. "//*[local-name()='identifier' and contains(.,'://')]";
*** base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
+
* '''''alternatives XPath''''': the expression for identifying additional content of the harvested resource, e.g. "//*[local-name()='relation' and contains(.,'://')]";
*** contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
+
* '''''set Identifiers List''''': the list of id of the sets to take into consideration during the harvesting phase;
*** default metadata format (e.g. oai_dc)
+
 
** There are also other non mandatory information we suggest to provide:
+
Two typologies of plugins have been defined:
*** title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
+
* '''''WrapSetsRequest''''': to create a collection for each set of the external repository or for each set specified in the setIdentifierList;
*** alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
+
* '''''WrapRepositoryRequest''''': to create a single collection containing the whole content of the repository or the content of the sets specified in the setIdentifierList;
*** repository name: self explaining field (e.g. aquacomm)
+
*** collection description: a description for a collection
+
  
 
== Tree model ==
 
== Tree model ==
Line 43: Line 36:
 
** ''lastUpdateTime'': the most recent time the item has been updated;  
 
** ''lastUpdateTime'': the most recent time the item has been updated;  
 
** ''provenance'': It is characterised by the following information:
 
** ''provenance'': It is characterised by the following information:
*** ''statement'': "This item has been created by "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;  
+
*** ''statement'': "This item has been created by the gCube "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;  
 
*** ''setID'': the repository set the object belongs to (optional and repeatable);
 
*** ''setID'': the repository set the object belongs to (optional and repeatable);
 
*** ''recordID'': the identifier of the metadata record;  
 
*** ''recordID'': the identifier of the metadata record;  
Line 56: Line 49:
  
 
=== XML Schema ===
 
=== XML Schema ===
 +
 +
The XML Schema of "record" element depends on the schema used to define record formats. This is a XML Schema of record in "oai_dc" metadataFormat.
 +
 
<source lang="xml">
 
<source lang="xml">
<?xml version="1.0" encoding="UTF-8"?>
+
 
 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" >
 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" >
          <!-- XML Schema Generated from XML Document on Fri Apr 05 2013 16:24:22 GMT+0200 (CEST) -->
 
          <!-- with XmlGrid.net Free Online Service http://xmlgrid.net -->
 
 
       <xs:element name="t:root" >
 
       <xs:element name="t:root" >
 
               <xs:complexType>
 
               <xs:complexType>
 
                     <xs:sequence>
 
                     <xs:sequence>
 +
                            <xs:element name="title" type="xs:string" />
 +
                            <xs:element name="collectionID" type="xs:string" />
 
                             <xs:element name="creationTime" type="xs:string" />
 
                             <xs:element name="creationTime" type="xs:string" />
 
                             <xs:element name="lastUpdateTime" type="xs:string" />
 
                             <xs:element name="lastUpdateTime" type="xs:string" />
                             <xs:element name="isDescribedBy" >
+
                             <xs:element name="provenance" >
 
                                   <xs:complexType>
 
                                   <xs:complexType>
 
                                           <xs:sequence>
 
                                           <xs:sequence>
                                                 <xs:element name="mimeType" type="xs:string" />
+
                                                 <xs:element name="statement" type="xs:string" />
                                                 <xs:element name="length" type="xs:int" />
+
                                                 <xs:element name="setID" type="xs:string" maxOccurs="unbounded" />
                                                <xs:element name="language" type="xs:string" />
+
                                                 <xs:element name="recordID" type="xs:string" />
                                                <xs:element name="schemaName" type="xs:string" />
+
                                                <xs:element name="schemaURI" type="xs:string" />
+
                                                <xs:element name="bytestream" type="xs:string" />
+
                                                <xs:element name="creationTime" type="xs:string" />
+
                                                 <xs:element name="lastUpdateTime" type="xs:string" />
+
 
                                             </xs:sequence>
 
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
 
 
                                       </xs:complexType>
 
                                       </xs:complexType>
 
                               </xs:element>
 
                               </xs:element>
                             <xs:element name="mimeType" type="xs:string" />
+
                             <xs:element name="metadata" >
                            <xs:element name="length" type="xs:int" />
+
                            <xs:element name="url" type="xs:string" />
+
                            <xs:element name="name" type="xs:string" />
+
                            <xs:element name="hasAlternative" maxOccurs="unbounded" >
+
 
                                   <xs:complexType>
 
                                   <xs:complexType>
 
                                           <xs:sequence>
 
                                           <xs:sequence>
 +
                                                <xs:element name="schema" type="xs:string" />
 +
                                                <xs:element name="schemaLocation" type="xs:string" />
 +
                                                  <xsd:element name="record">
 +
<xsd:complexType>
 +
  <xsd:sequence>
 +
    <xs:element name="oai_dc:dc" >
 +
                                                                            <xs:complexType>
 +
                                                                                    <xs:sequence>
 +
                                                                                          <xs:element name="dc:title" type="xs:string" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:creator" maxOccurs="unbounded" type="xs:string" />
 +
                                                                                          <xs:element name="dc:subject" maxOccurs="unbounded" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:description" type="xs:string" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:publisher" type="xs:string" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:contributor" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:date" type="xs:date" />
 +
                                                                                          <xs:element name="dc:type" maxOccurs="unbounded" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:format" type="xs:string" />
 +
                                                                                          <xs:element name="dc:identifier" type="xs:string" />
 +
                                                                                          <xs:element name="dc:source" type="xs:string" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:language" type="xs:string" />
 +
                                                                                          <xs:element name="dc:relation" type="xs:string" />
 +
                                                                                          <xs:element name="dc:coverage" maxOccurs="unbounded" >
 +
                                                                                                  <xs:complexType>
 +
                                                                                                        <xs:attribute name="xml:lang" type="xs:string" />
 +
                                                                                                    </xs:complexType>
 +
                                                                                              </xs:element>
 +
                                                                                          <xs:element name="dc:rights" type="xs:string" />
 +
                                                                                      </xs:sequence>
 +
                                                                                    <xs:attribute name="xmlns:oai_dc" type="xs:string" />
 +
                                                                                    <xs:attribute name="xmlns:xsi" type="xs:string" />
 +
                                                                                    <xs:attribute name="xmlns:dc" type="xs:string" />
 +
                                                                                    <xs:attribute name="xsi:schemaLocation" type="xs:string" />
 +
                                                                                </xs:complexType>
 +
                                                                        </xs:element>
 +
  </xsd:sequence>
 +
</xsd:complexType>
 +
      </xsd:element>
 +
                                            </xs:sequence>
 +
                                      </xs:complexType>
 +
                              </xs:element>
 +
                            <xs:element name="content" maxOccurs="unbounded" minOccurs="0">
 +
                                  <xs:complexType>
 +
                                          <xs:sequence>
 +
                                                <xs:element name="contentType" type="xs:string" />
 
                                                 <xs:element name="mimeType" type="xs:string" />
 
                                                 <xs:element name="mimeType" type="xs:string" />
                                                <xs:element name="length" type="xs:int" />
 
                                                <xs:element name="name" type="xs:string" />
 
 
                                                 <xs:element name="url" type="xs:string" />
 
                                                 <xs:element name="url" type="xs:string" />
                                                <xs:element name="creationTime" type="xs:string" />
 
                                                <xs:element name="lastUpdateTime" type="xs:string" />
 
 
                                             </xs:sequence>
 
                                             </xs:sequence>
                                          <xs:attribute name="t:id" type="xs:string" />
 
 
                                       </xs:complexType>
 
                                       </xs:complexType>
 
                               </xs:element>
 
                               </xs:element>
Line 109: Line 162:
 
=== Example ===
 
=== Example ===
  
A tree generated by OAI TM Plugin looks like this:
+
A tree generated by OAI TM Plugin looks like the XML code below. The original record is available at this link: [http://aquaticcommons.org/cgi/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:generic.eprints.org:359 aquaticcommons.org/cgi/oai2].
  
 
<source lang="xml">
 
<source lang="xml">
 
  
 
<?xml version="1.0" ?>
 
<?xml version="1.0" ?>
 
+
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees"
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:23">
+
t:id="oai:generic.eprints.org:359" t:source="7374617475733D756E707562">
 
+
<title>Association patterns and social dynamics of killer whales
<title>Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978</title>
+
(Orcinus orca) in Greater Puget Sound</title>
<collectionID>7375626A656374733D54</collectionID>
+
<collectionID>7374617475733D756E707562</collectionID>
<creationTime>2011-09-29T22:42:01.000+02:00</creationTime>
+
<creationTime>2011-09-29T22:10:02.000+02:00</creationTime>
<lastUpdateTime>2011-09-29T22:42:01.000+02:00</lastUpdateTime>
+
<lastUpdateTime>2011-09-29T22:10:02.000+02:00</lastUpdateTime>
 
<provenance>
 
<provenance>
<statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement>
+
<statement>This item has been created by the gCube OAI-TM plugin via
<setID>7375626A656374733D54</setID>
+
OAI-PMH metadata harvesting from the metadata provider aquacomm at
 +
http://aquacomm.fcla.edu/cgi/oai2</statement>
 +
<setID>7374617475733D756E707562</setID>
 +
<setID>7375626A656374733D48</setID>
 +
<setID>7375626A656374733D44</setID>
 +
<setID>74797065733D746865736973</setID>
 +
<recordID>oai:generic.eprints.org:359</recordID>
 
</provenance>
 
</provenance>
 
 
<metadata>
 
<metadata>
<schema>oai_dc</schema>
+
<schema></schema>
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation>
+
<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/
<record>&lt;oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&gt;
+
</schemaLocation>
&lt;dc:title&gt;Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978&lt;/dc:title&gt;
+
<record>
&lt;dc:creator&gt;Gotshall, Daniel W.&lt;/dc:creator&gt;
+
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
&lt;dc:creator&gt;Laurent, Laurence L.&lt;/dc:creator&gt;
+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
&lt;dc:creator&gt;Grant, John J.&lt;/dc:creator&gt;
+
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
&lt;dc:subject&gt;Ecology&lt;/dc:subject&gt;
+
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Association patterns and social dynamics of killer whales (Orcinus orca) in Greater
&lt;dc:subject&gt;Fisheries&lt;/dc:subject&gt;
+
Puget Sound</dc:title>
&lt;dc:subject&gt;Biology&lt;/dc:subject&gt;
+
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Heimlich-Boran, Sara Lou</dc:creator>
&lt;dc:description&gt;Although we continue to monitor permanent stations&amp;#13;
+
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Ecology</dc:subject>
on a regular basis, we have suspended our 30-m2&amp;#13;
+
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Biology</dc:subject>
random subtidal and 1/4-m2 random intertidal studies&amp;#13;
+
<dc:description xmlns:dc="http://purl.org/dc/elements/1.1/">Killer whales were observed in the inland marine waters of  
during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&amp;#13;
+
Washington and British Columbia from March to November 1982 and January to November 1983.
Giant red sea urchin, Strongylocentrotus franciscanus,&amp;#13;
+
The majority of the research occurred in Haro Strait in the San Juan Islands. 
numbers continue to decline at their last "stronghold"&amp;#13;
+
All whales were individually identifiable from naturally occurring marks and scars on the dorsal fin and back.  
in our subtidal study area, permanent station&amp;#13;
+
Many whales were identified visually in the field with the aid of a photographic guide to individuals (Biggs et al. 1987). 
15. The recruitment of juvenile blue rockfish,&amp;#13;
+
Seventy-two whales comprised the study population. Data collection concentrated on group composition and spacing,
Sebastes mystinus, appears to be either late or&amp;#13;
+
identification and associations of all whales present, and the recording of the dominant behavior occurring at that time.
low this year in our study areas. The most abundant&amp;#13;
+
Behaviors were categorized from combinations of quantifiable parameters of group composition, spacing of individuals,  
fish, so far, from the new method of assessment,&amp;#13;
+
speed and direction of travel, and the occurrence of specific behaviors such as leaps, tail slaps, penile erections, etc. (Osborne 1986). 
are adult blue rockfish, kelp greenling,&amp;#13;
+
Behaviors were pooled into four major groups: feeding, travel, rest and social/sexual behaviors.
Hexagrammos decagrammus, and gopher rockfish,&amp;#13;
+
The results suggest the following hypothesis about the social organization of the killer whales resident to Greater Puget Sound.
Sebastes carnatus.&amp;#13;
+
As a whale ages, it moves from an integrated position within the community, based on its relationship with its mother to a less
Various trends of abalone abundance at the permanent&amp;#13;
+
integrated period during adolescence in which social ties remain primarily through the older female generation.
intertidal stations, increasing at some,&amp;#13;
+
With full adulthood, dependency upon these “allo-mothers” (N.J. Haenel 1986) declines and direct affiliation with the mothers
decreasing at others, were observed during this&amp;#13;
+
are re-established.  Adult whales remain with the maternal sub-group. Close associations between adult whales appear to be based
quarter.&amp;#13;
+
on relationship between direct kinFission from the main material sub-group and the establishment of separate subgroups may be
Sea otters, Enhydra lutris, seem to have reached&amp;#13;
+
the result of several factors including the age of the older female and the number, ages, and sex of her offspring, including adult sons.
their annual springtime peak in abundance during&amp;#13;
+
When older females die out, siblings or cousins may separate more permanently, forming new lineages or pods.</dc:description>
April and May. Several otters were seen rafting&amp;#13;
+
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">1988</dc:date>
and foraging around and near the intake cove&amp;#13;
+
<dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">Thesis</dc:type>
breakwaters, apparently becoming emboldened to&amp;#13;
+
<dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">NonPeerReviewed</dc:type>
human presence(18pp.)&lt;/dc:description&gt;
+
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">application/pdf</dc:format>
&lt;dc:publisher&gt;California Department of Fish and Game, Marine Resources Region&lt;/dc:publisher&gt;
+
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://aquaticcommons.org/359/1/MS_Thesis_Social_Dynamics_Killer_Whales_Puget_Sound_1988.pdf</dc:identifier>
&lt;dc:date&gt;1979&lt;/dc:date&gt;
+
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">Heimlich-Boran, Sara Lou (1988) Association patterns and social dynamics of  
&lt;dc:type&gt;Monograph or Serial issue&lt;/dc:type&gt;
+
killer whales (Orcinus orca) in Greater Puget Sound. Masters thesis, San Jose State University.</dc:identifier>
&lt;dc:type&gt;NonPeerReviewed&lt;/dc:type&gt;
+
<dc:relation xmlns:dc="http://purl.org/dc/elements/1.1/">http://aquaticcommons.org/359/</dc:relation>
&lt;dc:format&gt;application/pdf&lt;/dc:format&gt;
+
</oai_dc:dc>
&lt;dc:identifier&gt;http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf&lt;/dc:identifier&gt;
+
</record>
&lt;dc:identifier&gt;Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)&lt;/dc:identifier&gt;
+
&lt;dc:relation&gt;http://aquaticcommons.org/23/&lt;/dc:relation&gt;&lt;/oai_dc:dc&gt;</record>
+
 
</metadata>
 
</metadata>
 
 
<content>
 
<content>
 
<contentType>main</contentType>
 
<contentType>main</contentType>
 
<mimeType>text/url</mimeType>
 
<mimeType>text/url</mimeType>
<url>http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf</url>
+
<url>http://aquaticcommons.org/359/1/MS_Thesis_Social_Dynamics_Killer_Whales_Puget_Sound_1988.pdf
 +
</url>
 
</content>
 
</content>
 
 
<content>
 
<content>
 
<contentType>alternative</contentType>
 
<contentType>alternative</contentType>
 
<mimeType>text/html; charset=UTF-8</mimeType>
 
<mimeType>text/html; charset=UTF-8</mimeType>
<url>http://aquaticcommons.org/23/</url>
+
<url>http://aquaticcommons.org/359/</url>
 
</content>
 
</content>
 
 
</t:root>
 
</t:root>
  
Line 198: Line 251:
 
   <groupId>org.gcube.data.oai.tmplugin</groupId>
 
   <groupId>org.gcube.data.oai.tmplugin</groupId>
 
   <artifactId>oai-tm-plugin</artifactId>
 
   <artifactId>oai-tm-plugin</artifactId>
   <version>1.1.0-2.13.0</version>
+
   <version>1.2.0-SNAPSHOT</version>
 +
</dependency>
 +
 
 +
</source>
 +
 
 +
The oai-harvester library is available here:
 +
 
 +
<source lang="xml">
 +
 
 +
<dependency>
 +
  <groupId>org.gcube.common</groupId>
 +
  <artifactId>oaiharvester</artifactId>
 +
  <version>1.1.0</version>
 
</dependency>
 
</dependency>
  
 
</source>
 
</source>

Latest revision as of 15:40, 27 June 2013

OAI TM Plugin is a plugin of the Tree Based Access Facilities that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.

Plugin parameters fields

Plugins lead to the creation of one or more collections. Thus, in addition to the information below, the user should specify collection name and description.

In order to instruct a Plugin on how to perform the harvesting, a user should specify the following mandatory information:

  • repository name: the name of the repository to be harvested, e.g. "aquacomm";
  • base URL: the base URL of the repository, e.g. "http://aquacomm.fcla.edu/cgi/oai2";
  • default metadata format: the metadata format to be used for harvesting, e.g. "oai_dc";
  • title XPath: the expression for identifying the title of the harvested resource, e.g. "//*[local-name()='title']";

In addition to that, the user might specify the following information:

  • content XPath: the expression for identifying the content of the harvested resource, e.g. "//*[local-name()='identifier' and contains(.,'://')]";
  • alternatives XPath: the expression for identifying additional content of the harvested resource, e.g. "//*[local-name()='relation' and contains(.,'://')]";
  • set Identifiers List: the list of id of the sets to take into consideration during the harvesting phase;

Two typologies of plugins have been defined:

  • WrapSetsRequest: to create a collection for each set of the external repository or for each set specified in the setIdentifierList;
  • WrapRepositoryRequest: to create a single collection containing the whole content of the repository or the content of the sets specified in the setIdentifierList;

Tree model

Conceptual Schema

Each collection item produced by this plugin is characterised by the following information:

  • item metadata: global information on the item including
    • title: the title of the record;
    • collectionID: the collection this item belongs to;
    • creationTime: the time the item was created;
    • lastUpdateTime: the most recent time the item has been updated;
    • provenance: It is characterised by the following information:
      • statement: "This item has been created by the gCube "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
      • setID: the repository set the object belongs to (optional and repeatable);
      • recordID: the identifier of the metadata record;
  • metadata (repeatable): the metadata record harvested. It is characterised by the following information:
    • schema: the metadata format of the metadata record;
    • schemaLocation: the metadata format schema URI;
    • record: the manifestation of the metadata record harvested;
  • content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
    • contentType: i.e. whether main or alternative content;
    • mimeType: MIME type of the actual content;
    • url: URL to the actual content;

XML Schema

The XML Schema of "record" element depends on the schema used to define record formats. This is a XML Schema of record in "oai_dc" metadataFormat.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" >
       <xs:element name="t:root" >
              <xs:complexType>
                     <xs:sequence>
                            <xs:element name="title" type="xs:string" />
                            <xs:element name="collectionID" type="xs:string" />
                            <xs:element name="creationTime" type="xs:string" />
                            <xs:element name="lastUpdateTime" type="xs:string" />
                            <xs:element name="provenance" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="statement" type="xs:string" />
                                                 <xs:element name="setID" type="xs:string" maxOccurs="unbounded" />
                                                 <xs:element name="recordID" type="xs:string" />
                                             </xs:sequence>
                                      </xs:complexType>
                               </xs:element>
                            <xs:element name="metadata" >
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="schema" type="xs:string" />
                                                 <xs:element name="schemaLocation" type="xs:string" />
                                                  	<xsd:element name="record">
								<xsd:complexType>
								  <xsd:sequence>
								    <xs:element name="oai_dc:dc" >
                                                                             <xs:complexType>
                                                                                    <xs:sequence>
                                                                                           <xs:element name="dc:title" type="xs:string" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:creator" maxOccurs="unbounded" type="xs:string" />
                                                                                           <xs:element name="dc:subject" maxOccurs="unbounded" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:description" type="xs:string" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:publisher" type="xs:string" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:contributor" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:date" type="xs:date" />
                                                                                           <xs:element name="dc:type" maxOccurs="unbounded" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:format" type="xs:string" />
                                                                                           <xs:element name="dc:identifier" type="xs:string" />
                                                                                           <xs:element name="dc:source" type="xs:string" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:language" type="xs:string" />
                                                                                           <xs:element name="dc:relation" type="xs:string" />
                                                                                           <xs:element name="dc:coverage" maxOccurs="unbounded" >
                                                                                                  <xs:complexType>
                                                                                                         <xs:attribute name="xml:lang" type="xs:string" />
                                                                                                     </xs:complexType>
                                                                                              </xs:element>
                                                                                           <xs:element name="dc:rights" type="xs:string" />
                                                                                       </xs:sequence>
                                                                                    <xs:attribute name="xmlns:oai_dc" type="xs:string" />
                                                                                    <xs:attribute name="xmlns:xsi" type="xs:string" />
                                                                                    <xs:attribute name="xmlns:dc" type="xs:string" />
                                                                                    <xs:attribute name="xsi:schemaLocation" type="xs:string" />
                                                                                </xs:complexType>
                                                                         </xs:element>
								  </xsd:sequence>
								</xsd:complexType>
						      </xsd:element>
                                             </xs:sequence>
                                      </xs:complexType>
                               </xs:element>
                            <xs:element name="content" maxOccurs="unbounded" minOccurs="0">
                                   <xs:complexType>
                                          <xs:sequence>
                                                 <xs:element name="contentType" type="xs:string" />
                                                 <xs:element name="mimeType" type="xs:string" />
                                                 <xs:element name="url" type="xs:string" />
                                             </xs:sequence>
                                      </xs:complexType>
                               </xs:element>
                        </xs:sequence>
                     <xs:attribute name="xmlns:t" type="xs:string" />
                     <xs:attribute name="t:id" type="xs:string" />
                 </xs:complexType>
          </xs:element>
   </xs:schema>

Example

A tree generated by OAI TM Plugin looks like the XML code below. The original record is available at this link: aquaticcommons.org/cgi/oai2.

<?xml version="1.0" ?>
<t:root xmlns:t="http://gcube-system.org/namespaces/data/trees"
	t:id="oai:generic.eprints.org:359" t:source="7374617475733D756E707562">
	<title>Association patterns and social dynamics of killer whales
		(Orcinus orca) in Greater Puget Sound</title>
	<collectionID>7374617475733D756E707562</collectionID>
	<creationTime>2011-09-29T22:10:02.000+02:00</creationTime>
	<lastUpdateTime>2011-09-29T22:10:02.000+02:00</lastUpdateTime>
	<provenance>
		<statement>This item has been created by the gCube OAI-TM plugin via
			OAI-PMH metadata harvesting from the metadata provider aquacomm at
			http://aquacomm.fcla.edu/cgi/oai2</statement>
		<setID>7374617475733D756E707562</setID>
		<setID>7375626A656374733D48</setID>
		<setID>7375626A656374733D44</setID>
		<setID>74797065733D746865736973</setID>
		<recordID>oai:generic.eprints.org:359</recordID>
	</provenance>
	<metadata>
		<schema></schema>
		<schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/
		</schemaLocation>
		<record>
			<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
				xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
				xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
				<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Association patterns and social dynamics of killer whales (Orcinus orca) in Greater 
				Puget Sound</dc:title>
				<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Heimlich-Boran, Sara Lou</dc:creator>
				<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Ecology</dc:subject>
				<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Biology</dc:subject>
				<dc:description xmlns:dc="http://purl.org/dc/elements/1.1/">Killer whales were observed in the inland marine waters of 
				Washington and British Columbia from March to November 1982 and January to November 1983. 
				The majority of the research occurred in Haro Strait in the San Juan Islands.  
				All whales were individually identifiable from naturally occurring marks and scars on the dorsal fin and back. 
				Many whales were identified visually in the field with the aid of a photographic guide to individuals (Biggs et al. 1987).  
				Seventy-two whales comprised the study population.  Data collection concentrated on group composition and spacing, 
				identification and associations of all whales present, and the recording of the dominant behavior occurring at that time. 
				Behaviors were categorized from combinations of quantifiable parameters of group composition, spacing of individuals, 
				speed and direction of travel, and the occurrence of specific behaviors such as leaps, tail slaps, penile erections, etc. (Osborne 1986).  
				Behaviors were pooled into four major groups: feeding, travel, rest and social/sexual behaviors. 
				The results suggest the following hypothesis about the social organization of the killer whales resident to Greater Puget Sound.  
				As a whale ages, it moves from an integrated position within the community, based on its relationship with its mother to a less 
				integrated period during adolescence in which social ties remain primarily through the older female generation.  
				With full adulthood, dependency upon these “allo-mothers” (N.J. Haenel 1986) declines and direct affiliation with the mothers 
				are re-established.  Adult whales remain with the maternal sub-group. Close associations between adult whales appear to be based 
				on relationship between direct kin.  Fission from the main material sub-group and the establishment of separate subgroups may be 
				the result of several factors including the age of the older female and the number, ages, and sex of her offspring, including adult sons. 
				 When older females die out, siblings or cousins may separate more permanently, forming new lineages or pods.</dc:description>
				<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">1988</dc:date>
				<dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">Thesis</dc:type>
				<dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">NonPeerReviewed</dc:type>
				<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">application/pdf</dc:format>
				<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://aquaticcommons.org/359/1/MS_Thesis_Social_Dynamics_Killer_Whales_Puget_Sound_1988.pdf</dc:identifier>
				<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">Heimlich-Boran, Sara Lou (1988) Association patterns and social dynamics of 
				killer whales (Orcinus orca) in Greater Puget Sound. Masters thesis, San Jose State University.</dc:identifier>
				<dc:relation xmlns:dc="http://purl.org/dc/elements/1.1/">http://aquaticcommons.org/359/</dc:relation>
			</oai_dc:dc>
		</record>
	</metadata>
	<content>
		<contentType>main</contentType>
		<mimeType>text/url</mimeType>
		<url>http://aquaticcommons.org/359/1/MS_Thesis_Social_Dynamics_Killer_Whales_Puget_Sound_1988.pdf
		</url>
	</content>
	<content>
		<contentType>alternative</contentType>
		<mimeType>text/html; charset=UTF-8</mimeType>
		<url>http://aquaticcommons.org/359/</url>
	</content>
</t:root>

Maven coordinates

The Maven coordinates of oai-tree-plugin of its development versions are:

<dependency>
  <groupId>org.gcube.data.oai.tmplugin</groupId>
  <artifactId>oai-tm-plugin</artifactId>
  <version>1.2.0-SNAPSHOT</version>
</dependency>

The oai-harvester library is available here:

<dependency>
  <groupId>org.gcube.common</groupId>
  <artifactId>oaiharvester</artifactId>
  <version>1.1.0</version>
</dependency>