HTTP Front End: OAI-ORE Implementation

From Gcube Wiki
Jump to: navigation, search

OAI-ORE

Introduction

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information.

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. This practice of aggregating extends to the Web. We accumulate URL's in bookmarks or favorites lists in our browser, collect photos into sets in popular sites, browse over multiple page documents that are linked together through "prev" and "next" tags, and talk about Web sites as if they had some real existence beyond the set of pages of which they consist. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a collection of images to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

Protocol Foundations

World Wide Web Architecture

The foundations of the Web as we know it are detailed in the Architecture of the World Wide Web. This architecture defines the following core notions:

  • Resource, an item of interest.
  • URI, a global identifier for a Resource.
  • Representation, a datastream corresponding to the state of a Resource at the time its URI is dereferenced via some protocol (e.g. HTTP).
  • Link, a directed connection between two Resources.

Semantic Web

On the Web that we use on a daily basis, URIs are used primarily to identify Web documents. They are identifiers that, when dereferenced, return a human-readable Representation. However, on the Semantic Web, URIs are introduced to identify so-called real world entities, such as people or cars, or even abstract entities, such ideas or classes. Since these things are not documents, they have no Representation to indicate what these Resources mean. The Linked Data Effort Linked Data Tutorial describes an approach for obtaining information about those Resources despite the fact that they have no Representation. To summarize, the approach consists of:

  • Using HTTP URIs to identify those special Resources;
  • Publishing a document that provides information about the special Semantic Web Resource at a HTTP URI other than the HTTP URI of the Semantic Web Resource;
  • Using Cool URIs for the Semantic Web to allow discovering the HTTP URI of that document from the HTTP URI of the special Semantic Web Resource.

RDF (Resource Description Framework)

The documents that are proposed by the Linked Data effort to describe these abstract Resources are typically expressed in RDF/XML, which is an XML-based serialization for the Resource Description Framework (RDF) RDF Concepts that forms the foundational data model of the Semantic Web. This model consists of subject-predicate-object statements called triples. Triples express relationships pertaining to a subject Resource denoted by a URI. The predicate Resource, also denoted by a URI, indicates the nature of the relationship . The object expresses the actual value for the relationship expressed by the predicate; the object can be denoted by a URI or can be a literal value, such as a string or a number. When multiple triples are expressed, or asserted, they may share subjects and objects and, as a result they conceptually join together in what is called a graph in mathematical terms. This graph consists of nodes that are the Resources denoted by the subject and object URIs, and edges that are the relationship predicates.

ORE Solution

ORE leverages the foundations described above to arrive at a solution to handle aggregations of Web resources. The essence of the ORE solution can be summarized as follows:

  • In order to be able to unambiguously refer to an aggregation of Web resources, a new Resource is introduced that stands for a set or collection of other Resources. This new Resource, named an Aggregation, has a URI just like any other Resource on the Web does. And, since an Aggregation is a conceptual construct, it qualifies as one of those Semantic Web Resources that does not have a Representation.
  • Following the Linked Data guidelines, another Resource is introduced to make information about the Aggregation available. This new Resource, named a Resource Map, has a URI and it has a machine-readable Representation that provides details about the Aggregation. In essence, a Resource Map expresses which Aggregation it describes, and it lists the resources that are part of the Aggregation. But, a Resource Map can also express relationships and properties pertaining to all these Resources, as well as metadata pertaining to the Resource Map itself, e.g. who published it and when it was most recently modified. Resource Maps can be expressed in different formats including Atom XML, RDF/XML, RDFa, n3, turtle, and other RDF serialization formats.
  • In order to make ORE work in the HTTP-based Web, both the Aggregation and the Resource Map are assigned HTTP URIs, and the Cool URI for the Semantic Web guidelines are adopted to support discovery of the HTTP URI of a Resource Map given the HTTP URI of an Aggregation.
  • ORE also introduces the notion of a Proxy resource, which stands for an Aggregated Resource in the context of a specific Aggregation. The URI of a Proxy resource provides a mechanism for denoting a resource in context.

Mapping to gCube Model

gCube Content Model aims to provide high-level functionality for manipualtion of content over the Grid-based environments. Content in gCube is stored and organized following a graph-based data model, the Information Object Model, that allows finer control of content, by incorporating the possibility to annotate content with arbitrary properties an to relate different content unities via arbitrary relationships.

Starting from this model a document model has been built, in which complex documents, composed of various, eventually nested subparts, are represented as chains of Information Objects linked via appropriate relationships. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). In this respect, gCube documents are managed as compound objects comprising metadata, annotations, alternative representations and multiple parts. The notion of gCube documents is implemented and mangaged by the gCube Information Organisation Services family of subsystems that include storage services, access services, plugins and a number of distinguished clients that can be internal or external to the system.

The aggregated information that constructs a gCube document can be transfered through the solution provided by OAI-ORE, without the need for clients to rely on the API's of the individual system architectures and their definition of document boundaries. The gCube ORE Provider allows the dissemination of the digital objects stored in gCube repository as OAI-ORE Resource Maps.

gCube ORE Resource Maps

The essence of a gCube document or a gCube collection is being introduced as an aggregation that denotes the entire document by publishing a machine-readable document that describes that aggregation. For example, the document describes which resources are part of the aggregation, and which are merely related to it.

To this end, gCube documents and gCube collections are handled as OAI-ORE Aggregations and are being disseminated as OAI-ORE Resource Maps. Each ORE Resource Map for gCube documents and collections can be reached at

http://<host>:<port>/aslHttpOREProvider/gOREProvider/rem?id=<gdocuri>&scope=<gCubescope>

where <host> is the address of the server where the gCube ORE-Provider application has been deployed.

gOREProvider supports Resource Map serializations in Atom XML. The information registered in the Resource Map for each gCube complex object is expressed as an Atom entry and covers the ORE concepts for:

  • the ORE Aggregation described by the atom entry
  • the Resource Map that describes the Aggregation
  • the Aggregated Resources of the Aggregation
  • metadata about Aggregated Resources

The Aggregated Resources registered in the Resource Map of a gCube document include the main data structures to describe the entity with broad and well-known semantics. Thus, an Aggregated Resource record is added for each one of the following:

  • document: the self-contained unit of content within the collection of related units;
  • metadata: a description of the document in a specific schema
  • annotations: subjective observations or records about a document
  • parts: components of a document
  • alternative represenations: secondary manifestations of a document

An example of a Resource Map for a gCube document is presented bellow:

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:oreatom="http://www.openarchives.org/ore/atom/"
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:dc="http://purl.org/dc/elements/1.1/"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:ore="http://www.openarchives.org/ore/terms/"
       xmlns:atom="http://www.w3.org/2005/Atom">
 
 
	   	   <id>cms://7f78c200-f877-11dd-8103-acc6e633ea9e/cfefd160-f877-11dd-8103-acc6e633ea9e</id>
 
 
	   	   <link rel="self" type="application/atom+xml" href="cms://7f78c200-f877-11dd-8103-acc6e633ea9e/cfefd160-f877-11dd-8103-acc6e633ea9e"/>
	   <atom:link rel="describes" href="http://dl09.di.uoa.gr:8285/aslHttpOREProvider/gOREProvider/agg?id=cms://7f78c200-f877-11dd-8103-acc6e633ea9e/cfefd160-f877-11dd-8103-acc6e633ea9e"/>
	   <atom:author>
			<atom:name>gCube ORE-Provider</atom:name>
	   </atom:author>
       <atom:updated>2012/07/25 08:06:16</atom:updated>
	   <atom:category scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/ResourceMap" label="Resource Map"/>
 
 
	   	   	   <link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FContentViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  title="Programa de informacion de especies acuaticas  - Anguilla anguilla -" 
  type="text/uri-list" hreflang="en"/>
 
  		  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2F05e54b60-f878-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  title="MetadataObject" 
  type="text/xml"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2F0226e3d0-f878-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  title="MetadataObject" 
  type="text/xml"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2Ff8d18420-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  title="MetadataObject" 
  type="text/xml"/>
 
 
  		  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FContentViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2Fd0eaac20-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS%26elementType%3Dalternative" 
  title="Programa de informacion de especies acuaticas  - Anguilla anguilla -" 
  type="text/uri-list"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FContentViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2Fd092ef30-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS%26elementType%3Dalternative" 
  title="Programa de informacion de especies acuaticas  - Anguilla anguilla -" 
  type="text/uri-list"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FContentViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2Fd03abd10-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS%26elementType%3Dalternative" 
  title="Programa de informacion de especies acuaticas  - Anguilla anguilla -" 
  type="text/uri-list"/>
 
 
				<oreatom:triples>
									<rdf:Description
				rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2F05e54b60-f878-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<dcterms:conformsTo rdf:resource="http://193.43.36.238:8282/fi/figis/devcon/schema/dc/qualifieddc.xsd"/>
				<rdf:type rdf:resource="info:eu-repo/semantics/descriptiveMetadata"/>
			</rdf:Description>
						<rdf:Description
				rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2F0226e3d0-f878-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<dcterms:conformsTo rdf:resource="http://193.43.36.238:8282/fi/figis/devcon/schema/dc/qualifieddc.xsd"/>
				<rdf:type rdf:resource="info:eu-repo/semantics/descriptiveMetadata"/>
			</rdf:Description>
						<rdf:Description
				rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpContentAccess%2FMetadataViewer%3FdocumentURI%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%2Ff8d18420-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<dcterms:conformsTo rdf:resource="http://193.43.36.238:8282/fi/figis/devcon/schema/dc/qualifieddc.xsd"/>
				<rdf:type rdf:resource="info:eu-repo/semantics/descriptiveMetadata"/>
			</rdf:Description>
 
 
 
		</oreatom:triples>
</entry>

The links for the Aggregated Resources point to the corresponding data structures that can be reached through the HTTP Front End applications of the gCube system. The address of the nodes in which each application is deployed are configurable per gOREProvider installation. To this end, a set of initialization parameters must be configured in the web.xml file of the gOREProvider, once installed, for the definition of the following information:

  • metadataViewerURL: address of the MetadataViewer application used, for rendering of gCube Metadata
  • contentViewerURL: address of the ContentViewer application used, for rendering gCube Content, Annotations, Parts, Alternative Representations
  • collectionViewerURL: address of the CollectionViewer appliation used for rendering information about a gCube collection
  • author: the author of the disseminated Resource Documents

For each aggregated resource, a triple indicating the notion (metadata, annotation, part) of the corresponding entity is added in the section of the metadata for the Aggregated Resources (inside the oreatom:triples element).

The Aggregated Resources registered in the Resource Map of a gCube Collection consist of data structures for each gCube Document belonging to the collection, and are denoted as Aggregations too, using the nested Aggregations recommendation of the ORE protocol. An example of a Resource Map for a gCube collections is demonstrated bellow:

<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:oreatom="http://www.openarchives.org/ore/atom/"
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:dc="http://purl.org/dc/elements/1.1/"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:ore="http://www.openarchives.org/ore/terms/"
       xmlns:atom="http://www.w3.org/2005/Atom">
 
        	   <id>7f78c200-f877-11dd-8103-acc6e633ea9e</id>
 
 
	   	   <link rel="self" type="application/atom+xml" href="7f78c200-f877-11dd-8103-acc6e633ea9e"/>
	   <atom:link rel="describes" href="http://dl09.di.uoa.gr:8285/aslHttpOREProvider/gOREProvider/agg?id=7f78c200-f877-11dd-8103-acc6e633ea9e"/>
	   <atom:author>
			<atom:name>gCube ORE-Provider</atom:name>
	   </atom:author>
       <atom:updated>2012/07/25 08:05:48</atom:updated>
	   <atom:category scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/ResourceMap" label="Resource Map"/>
 
 
	   	   	   <link rel="http://www.openarchives.org/ore/terms/aggregates" 
  href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpInformationRetrieval%2FCollectionInfos%3FselectedCollections%3D7f78c200-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  type="text/xml" hreflang="en"/>
 
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  		href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  		title="$item.name" 
  		type="$item.mime"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  		href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  		title="$item.name" 
  		type="$item.mime"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  		href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  		title="$item.name" 
  		type="$item.mime"/>
  		  		<link rel="http://www.openarchives.org/ore/terms/aggregates" 
  		href="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS" 
  		title="$item.name" 
  		type="$item.mime"/>
 
  				<oreatom:triples>
			  			<rdf:Description rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
				</rdf:Description>
			  			<rdf:Description rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
				</rdf:Description>
			  			<rdf:Description rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
				</rdf:Description>
			  			<rdf:Description rdf:about="http%3A%2F%2Fdl09.di.uoa.gr%3A8285%2FaslHttpOREProvider%2FgOREProvider%2Fagg%3Fid%3Dcms%3A%2F%2F7f78c200-f877-11dd-8103-acc6e633ea9e%2Fcfefd160-f877-11dd-8103-acc6e633ea9e%26scope%3D%2Fd4science.research-infrastructures.eu%2FFARM%2FFCPPS">
				<rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
				</rdf:Description>
			  		</oreatom:triples>
 
</entry>

gCube ORE Aggregations

Since, each compound object is viewed conceptually as an Aggregation in the context of the protocol, a URI for the conceptual constructs is also introduced and similarly reached at

http://<host>:<port>/aslHttpOREProvider/gOREProvider/agg?id=<gdocuri>&scope=<gCubescope>

An Agregation is one of those special Semantic Web resources for which dereferencing a URI via an HTTP protocol request does not yield a Representation. Therefore, once given the HTTP URI of an Aggregation, gORE-Provider follows the HTTP 303 Forwarding from the Aggregation URI to the Resource Map URI recommendation of ORE, to provide access to the Resource Map that describes the particular Aggregation.

gCube ORE Proxies

In the case where an Aggregated Resource represents a part of the main document, the information about the order of the part inside the compound object is being added as a metadata of the introduced record. This is achieved with the use of Proxies, as those are defined by the ORE data model to enable the establishement of lineage: the assertion that an Aggregated Resource originated or was sourced from another Aggregation.