OAITMPlugin
From Gcube Wiki
OAI TM Plugin is a plugin that allows harvesting of metadata descriptions of the records in an archive, using The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
Each OAI Record is transformed in a edge-labelled tree by OAI TM Plugin.
Plugin parameters fields
Retrieving data on OAI TM Plugin requires a harvester (user) to apply one of the two request types to obtain data from OAI repositories:
- WrapSetsRequest: to create a collection for each set of the external repository.
- In this case the mandatory information you have to provide are the following:
- base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
- contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
- default metadata format (e.g. oai_dc)
- There are also other non mandatory information we suggest to provide:
- title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
- alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
- repository name: self explaining field (e.g. aquacomm)
- setIdentifierList: the id of the set to take into consideration
- In this case the mandatory information you have to provide are the following:
- WrapRepositoryRequest: it has to be used to create a collection containing the entire data available in the external repository.
- In this case the mandatory information you have to provide are the following:
- base URL (e.g. http://aquacomm.fcla.edu/cgi/oai2)
- contentXPath (e.g. //*[local-name()='identifier' and contains(.,'://')])
- default metadata format (e.g. oai_dc)
- There are also other non mandatory information we suggest to provide:
- title Xpath: the XPATH from which we can retrieve the title of a document (e.g. //*[local-name()='title'])
- alternatives Xpath: xpath to define possible alternatives (e.g. //*[local-name()='relation' and contains(.,'://')])
- repository name: self explaining field (e.g. aquacomm)
- collection description: a description for a collection
- In this case the mandatory information you have to provide are the following:
Tree model
Conceptual Schema
Each collection item produced by this plugin is characterised by the following information:
- item metadata: global information on the item including
- title: the title of the record;
- collectionID: the collection this item belongs to;
- creationTime: the time the item was created;
- lastUpdateTime: the most recent time the item has been updated;
- provenance: It is characterised by the following information:
- statement: "This item has been created by "+ pluginName +" via OAI-PMH metadata harvesting from the metadata provider "+repositoryName+" at "+baseURL;
- setID: the repository set the object belongs to (optional and repeatable);
- recordID: the identifier of the metadata record;
- metadata (repeatable): the metadata record harvested. It is characterised by the following information:
- schema: the metadata format of the metadata record;
- schemaLocation: the metadata format schema URI;
- record: the manifestation of the metadata record harvested;
- content (repeatable): any potential payload shipped with the metadata record. It is characterised by the following information:
- contentType: i.e. whether main or alternative content;
- mimeType: MIME type of the actual content;
- url: URL to the actual content;
XML Schema
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" > <!-- XML Schema Generated from XML Document on Fri Apr 05 2013 16:24:22 GMT+0200 (CEST) --> <!-- with XmlGrid.net Free Online Service http://xmlgrid.net --> <xs:element name="t:root" > <xs:complexType> <xs:sequence> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> <xs:element name="isDescribedBy" > <xs:complexType> <xs:sequence> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="language" type="xs:string" /> <xs:element name="schemaName" type="xs:string" /> <xs:element name="schemaURI" type="xs:string" /> <xs:element name="bytestream" type="xs:string" /> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> </xs:sequence> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="url" type="xs:string" /> <xs:element name="name" type="xs:string" /> <xs:element name="hasAlternative" maxOccurs="unbounded" > <xs:complexType> <xs:sequence> <xs:element name="mimeType" type="xs:string" /> <xs:element name="length" type="xs:int" /> <xs:element name="name" type="xs:string" /> <xs:element name="url" type="xs:string" /> <xs:element name="creationTime" type="xs:string" /> <xs:element name="lastUpdateTime" type="xs:string" /> </xs:sequence> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="xmlns:t" type="xs:string" /> <xs:attribute name="t:id" type="xs:string" /> </xs:complexType> </xs:element> </xs:schema>
Example
A tree generated by OAI TM Plugin looks like this:
<?xml version="1.0" ?> <t:root xmlns:t="http://gcube-system.org/namespaces/data/trees" t:id="oai:generic.eprints.org:23"> <title>Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978</title> <collectionID>7375626A656374733D54</collectionID> <creationTime>2011-09-29T22:42:01.000+02:00</creationTime> <lastUpdateTime>2011-09-29T22:42:01.000+02:00</lastUpdateTime> <provenance> <statement>This item has been created by OAI-TM plugin via OAI-PMH metadata harvesting from the metadata provider aquacomm at http://aquacomm.fcla.edu/cgi/oai2</statement> <setID>7375626A656374733D54</setID> </provenance> <metadata> <schema>oai_dc</schema> <schemaLocation>http://www.openarchives.org/OAI/2.0/oai_dc/</schemaLocation> <record><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978</dc:title> <dc:creator>Gotshall, Daniel W.</dc:creator> <dc:creator>Laurent, Laurence L.</dc:creator> <dc:creator>Grant, John J.</dc:creator> <dc:subject>Ecology</dc:subject> <dc:subject>Fisheries</dc:subject> <dc:subject>Biology</dc:subject> <dc:description>Although we continue to monitor permanent stations&#13; on a regular basis, we have suspended our 30-m2&#13; random subtidal and 1/4-m2 random intertidal studies&#13; during this interim year. The 1/4-m2 random subtidal study is being continued and we have added a new subtidal method of determining fish abundance.&#13; Giant red sea urchin, Strongylocentrotus franciscanus,&#13; numbers continue to decline at their last "stronghold"&#13; in our subtidal study area, permanent station&#13; 15. The recruitment of juvenile blue rockfish,&#13; Sebastes mystinus, appears to be either late or&#13; low this year in our study areas. The most abundant&#13; fish, so far, from the new method of assessment,&#13; are adult blue rockfish, kelp greenling,&#13; Hexagrammos decagrammus, and gopher rockfish,&#13; Sebastes carnatus.&#13; Various trends of abalone abundance at the permanent&#13; intertidal stations, increasing at some,&#13; decreasing at others, were observed during this&#13; quarter.&#13; Sea otters, Enhydra lutris, seem to have reached&#13; their annual springtime peak in abundance during&#13; April and May. Several otters were seen rafting&#13; and foraging around and near the intake cove&#13; breakwaters, apparently becoming emboldened to&#13; human presence. (18pp.)</dc:description> <dc:publisher>California Department of Fish and Game, Marine Resources Region</dc:publisher> <dc:date>1979</dc:date> <dc:type>Monograph or Serial issue</dc:type> <dc:type>NonPeerReviewed</dc:type> <dc:format>application/pdf</dc:format> <dc:identifier>http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf</dc:identifier> <dc:identifier>Gotshall, Daniel W. and Laurent, Laurence L. and Grant, John J. (1979) Diablo Canyon power plant site ecological study Quarterly Report no. 20: April 1 - June 30, 1978. Avila Beach, CA, California Department of Fish and Game, Marine Resources Region, (Marine Resources Administrative Report, 79-4)</dc:identifier> <dc:relation>http://aquaticcommons.org/23/</dc:relation></oai_dc:dc></record> </metadata> <content> <contentType>main</contentType> <mimeType>text/url</mimeType> <url>http://aquaticcommons.org/23/1/Marine_Resources_Administrative_Report_No._79%2D4.pdf</url> </content> <content> <contentType>alternative</contentType> <mimeType>text/html; charset=UTF-8</mimeType> <url>http://aquaticcommons.org/23/</url> </content> </t:root>
Maven coordinates
The Maven coordinates of oai-tree-plugin of its development versions are:
<dependency> <groupId>org.gcube.data.oai.tmplugin</groupId> <artifactId>oai-tm-plugin</artifactId> <version>1.1.0-2.13.0</version> </dependency>