GCube Web Specifications and Standards Compliance

From Gcube Wiki
Revision as of 20:55, 13 October 2012 by Manuele.simi (Talk | contribs) (Functional Category)

Jump to: navigation, search

This area collects the Standard Specifications supported by the gCube system APIs, as part of the WP11 activities and towards meeting the integration and interoperability objectives for promoting the openness of the e-Infrastructure to other neighbouring and external ones. The collection focuses on the widely used, HTTP-based Specifications and generic interchange protocols (data/content standards, metadata standards, Web interface standards, security standards, data sharing protocols, data transfer protocols) that service both disseminating and consuming system's needs. This analysis is conducted per functional category and addresses the use, need and relevance of the standards that fall under each gCube functional area.

Contents

Table of Protocols

Specification label Functional area Direction Adoption Status
OAI-PMH (Producer) Data Consumption Producer Completed
OAI-ORE (Producer) Data Consumption Producer Completed
OpenSearch (Consumer) Data Consumption Consumer Completed
OpenSearch (Producer) Data Consumption Producer Completed
SRU (Consumer) Data Consumption Consumer Planned
FTP/FTPS/SFTP Data Transfer Consumer Ongoing
HTTP/HTTPS Data Transfer Consumer Ongoing
Web Services Data Access and Integration – The XML Realization (WS-DAIX) Specification Information System Producer/Consumer Completed


  • Functional areas: Data Consumption / Data Production / Computation Consumption / Infrastructure Management / Data Transfer
  • Direction: Producer / Consumer
  • Adoption Status: Completed / On going / Planned

OAI-PMH

Specification Description

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a well-established standard in the content management and library science worlds that is gaining in importance. It provides a low-barrier mechanism for repository interoperability and defines the following parties and software components:

  • Data Providers are repositories that expose structured metadata via OAI-PMH. A 'Data Provider' such as an academic library runs a Repository that supports OAI-PMH as a means of exposing metadata information about resources, for instance academic publications.
  • Service Providers then make OAI-PMH service requests to harvest that metadata. A 'Service Provider' uses Harvester software to harvest metadata from Data Providers. The harvested metadata can then be used to provide valued-added services, such as a website that allows browsing and searching through their catalog.

OAI-PMH is a set of six verbs or services that are invoked within HTTP. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations.

gCube Use/Need/Relevance

Through OAI-PMH protocol, gCube infrastructure acts as a 'Data Provider' and disseminates the hosted metadata records in a standard fashion, thus allowing for interoperation with other data e-Infrastructures that run autonomously. Other infrasturctures can harvest the metadata descriptions of gCube content in archives so that their services can exploit the collections. The protocol provides an application-independent interoperability framework for metadata exchange between the online parties.

Functional Category

Data Consumption

Direction

  • Producer

gCube Adoption Status

The protocol has already been integrated in the gCube system, from the 'Data Provider' perspective. The description of the adopted methodology towards the integration is described here.

Related gCube components

  • aslHttp OAI_PMH: the http front end for the protocol
  • applicationSupportLayer_OAI_PMH: business logic back-end component for the protocol

References


OAI-ORE

Specification Description

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information.

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. This practice of aggregating extends to the Web. We accumulate URL's in bookmarks or favorites lists in our browser, collect photos into sets in popular sites, browse over multiple page documents that are linked together through "prev" and "next" tags, and talk about Web sites as if they had some real existence beyond the set of pages of which they consist. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a collection of images to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

gCube Use/Need/Relevance

gCube Content Model aims to provide high-level functionality for manipualtion of content over the Grid-based environments. Content in gCube is stored and organized following a graph-based data model, the Information Object Model, that allows finer control of content, by incorporating the possibility to annotate content with arbitrary properties an to relate different content unities via arbitrary relationships.

Starting from this model a document model has been built, in which complex documents, composed of various, eventually nested subparts, are represented as chains of Information Objects linked via appropriate relationships. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). In this respect, gCube documents are managed as compound objects comprising metadata, annotations, alternative representations and multiple parts. The notion of gCube documents is implemented and mangaged by the gCube Information Organisation Services family of subsystems that include storage services, access services, plugins and a number of distinguished clients that can be internal or external to the system.

The aggregated information that constructs a gCube document can be transfered through the solution provided by OAI-ORE, without the need for clients to rely on the API's of the individual system architectures and their definition of document boundaries. The gCube ORE Provider allows the dissemination of the digital objects stored in gCube repository as OAI-ORE Resource Maps.

Functional Category

Data Consumption

Direction

  • Producer

gCube Adoption Status

The protocol has been recently integrated in the gCube System, from the producer perspective. The description of the adopted methodology towards the adoption and implementation is described here.

Components affected / relevant

  • aslHttp ORE_Provider: the http front end for the protocol
  • applicationSupportLayer_OAI_ORE: business logic back-end component for the protocol

References

OpenSearch

Specification Description

OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. OpenSearch helps search engines and search clients communicate by introducing a common set of formats to perform search requests and syndicate search results. The five basic pieces of information that a search client needs in order to communicate effectively with a search engine supporting the protocol are:

  • local search engine location
  • the query-grammar expected
  • the request encoding
  • the response encoding
  • the record encoding

The OpenSearch protocols define what a description document looks like, but not how it is retrieved. The location of the description document is discovered by some means outside the protocol (a priori knowledge). The description document specifies the location of the local search engine, how to formulate the search URL, and the local search engine's language to which the queries submitted must comply.

gCube Use/Need/Relevance

As a producer, gCube can publish search results in a standard and accessible format, thus allowing metasesarch engines to know how to send a search to the gCube search engine and how to interpret the results. As a consumer, gCube can access external providers which publish their results through search engines conforming to the OpenSearch Specification. Therefore, it can act as a metasearch engine that combines results coming from a gCube search with the results from searching many other sites simultaneously, providing a high level of integration.

Functional Category

Data Consumption

Direction

  • Producer: OpenSearch interface over gCube search
  • Consumer: OpenSearch Framework accessing external OpenSearch providers

gCube Adoption Status

The protocol has been recently integrated in the gCube System, both from the producer and the consumer perspective. The description of the adopted methodology towards the adoption and implementation in the side of the producer, is described here. The consumer side functionality of the gCube OpenSearch implementation is concentrated in the OpenSearch framework, whose description and features analysis can be found here

Components affected / relevant

  • aslHttp Information Retrieval - OpenSearch: the http front end for the protocol that allows the gCube search engine to act as an OpenSearch Provider
  • OpenSearch Library: framework that includes a core library providing general-purpose OpenSearch functionality, and the OpenSearch Operator which utilizes functionality provided by the former
  • OpenSearch Service: the web service responsible for the invocation of the OpenSearch Operator in the context of the provider to be queried

References

SRU

Specification Description

SRU is a standard XML-focused search protocol for Internet search queries, utilizing CQL (Contextual Query Language), a standard syntax for representing queries. As in OpenSearch, the five basic pieces of information provided by the mechanisms of the protocol to a client trying to communicate with the search engine, are: (1) local search engine location, (2) the query grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record endoding. SRU expects that the content provider will have a description record that describes the search service. The protocols define what a description record looks like and specifies that it can be obtaines from the local search engine. The location of the local search engine is provided by means outside the protocol (a priori knowledge). SRU defines also how to formulate the search URL by defining it, and specifies a standard query grammarL CQ: (Common Query Language). This means that clients of the engine only have to write one translator for all the SRU local search engines but also that all SRU local search engines have to support the CQL query grammar.

gCube Use/Need/Relevance

The gCube Search engine supports CQL as its native query grammar, thus complying fully to the SRU requirements for query formulation. Providing and interface and the description mechanisms defined by the protocol would allow all SRU metasearch engines to access gCube results in a standard way and with a high integration level.

In the consuming side of the protocol, gCube would act as a metasearch engine for external search engines. An SRU provider integrated in the gCube system , would allow the dissemination of its results coming along with gCube results, within a single search in gCube system. The mechanism for the registration of the information that describes the SRU search engines in the gCube system, can be integrated with the one already implemented for OpenSearch providers, since the requirements for effective communication with external search engines are common in both protocols.

Functional Category

Data Consumption

Direction

  • Producer: OpenSearch interface over gCube search
  • Consumer: OpenSearch Framework accessing external OpenSearch providers

gCube Adoption Status

The protocol is planned to be integrated within the system, covering both consumer and producer sides and exploiting the already implemented mechanisms for OpenSearch providers subscription to the system.

Components affected / relevant

  • aslHttp Information Retrieval - SRU: the http front end for the protocol that allows the gCube search engine to act as an SRU provider
  • SRU - OpenSearch Convertor: the mechanism that will be converting the response of an 'explain' request in SRU protocol to the equivalent OpenSearch Description document, to register the provider information within the gCube System
  • To be defined.

References


FTP/FTPS/SFTP

Specification Description

File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host or to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and uses separate control and data connections between the client and the server. FTP users may authenticate themselves using a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that hides (encrypts) the username and password, and encrypts the content, FTP is often secured with SSL/TLS FTPS. SSH File Transfer Protocol SFTP is sometimes also used instead.

gCube Use/Need/Relevance

The access to servers in order to perform File transfer using standard protocols has to be provided in gCube in the context of the Data Transfer activity. In this particular case the usage of the FTP protocols family can be foreseen for both accessing external file repositories and as internal transfer protocol between nodes of the infrastructure.

Functional Category

Data Transfer

Direction

Consumer

gCube Adoption Status

The FTP protocol has been already integrated in the Data Transfer Agent Service. The secure versions of the protocol ( FTPS, SFTP) are also integrated but they requires the deployment of a secure infrastructure to be exploited ( on-going activity)

Components affected / relevant

  • Data Transfer Agent Service: The component responsible for transferring data using standard protocols

References

HTTP/HTTPS

Specification Description

HTTP functions as a request-response protocol in the client-server computing model. A web browser, for example, may be the client and an application running on a computer hosting a web site may be the server. The client submits an HTTP request message to the server. The server, which provides resources such as HTML files and other content, or performs other functions on behalf of the client, returns a response message to the client. The response contains completion status information about the request and may also contain requested content in its message body.

Hypertext Transfer Protocol Secure (HTTPS) is a widely used communications protocol for secure communication over a computer network, with especially wide deployment on the Internet. Technically, it is not a protocol in itself; rather, it is the result of simply layering the Hypertext Transfer Protocol (HTTP) on top of the SSL/TLS protocol, thus adding the security capabilities of SSL/TLS to standard HTTP communications.

gCube Use/Need/Relevance

The access to servers in order to perform File transfer using standard protocols has to be provided in gCube in the context of the Data Transfer activity. In this particular case the usage of the HTTP protocols family can be foreseen for accessing external repositories containing archives of Environmental data.

Functional Category

Data Transfer

Direction

Consumer

gCube Adoption Status

The HTTP protocol has been already integrated in the Data Transfer Agent Service. The secure version of the protocol ( HTTPS ) is also integrated but it requires the deployment of a secure infrastructure to be exploited ( on-going activity)

Components affected / relevant

  • Data Transfer Agent Service: The component responsible for transferring data using standard protocols

References

Web Services Data Access and Integration – The XML Realization (WS-DAIX) Specification

Specification Description

The WS-DAI (Web Service Data Access and Integration) family of specifications defines web service interfaces to data resources, such as relational or XML databases. The specifications have been developed by the Database Access and Integration Services Working Group of the Grid Forum, and can be used independently, or as part of a wider service-based grid architecture. The specifications include properties that can be used to describe a data service or the resource to which access is being provided, and define message patterns that support access to (query and update) data resources. The specifications include a data model-independent specification (WS-DAI), which is extended in two realizations to include model-dependent properties and operations in relational(WS-DAIR) and XML (WS-DAIX) specifications.

The XML realization extends the properties defined in WS-DAI, instantiates the templates for direct and indirect data access, and defines interfaces for accessing the results of requests directed at a XML data service. Key features include: properties and operations for manipulating collections of documents, direct and indirect access using both XQuery and XPath, and data modification using XUpdate.

gCube Use/Need/Relevance

gCube has a general purpose Information System whose handled information may be consumed by clients from outside the gCube infrastructure. WS-DAIX allows to consume such information in a standard and widely adopted way.

In addition, the Information System may act as a generic indexed of raw XML documents and service can use it to index and query its own data trough the exposed WS-DAIX operations.

Functional Category

Web intefaces

Direction

Producer and consumer

gCube Adoption Status

The WS-DAIX's XMLCollectionAccess portType (feeding and retrieval of XML documents) has been fully implemented by the Information Collector Service. The same service exposes the XQueryExecute portType nearly compliant with the analogous portType defined by the standard.

The IS-Publisher library is a producer and consumer of the XMLCollectionAccess portType, while the IS-Client library is a consumer of the XQueryExecute portType.

Components affected / relevant

  • IS-InformationCollector: producer
  • IS-Publisher: producer/consumer
  • IS-Client: consumer

References

Protocol XX

Specification Description

Description and useful information about the Specification.

gCube Use/Need/Relevance

Describe the use/need/relevance of the specification in respect to the functional area of the system.

Functional Category

The functional category under which the services underlying the protocol fall.

Direction

The direction towards the system (Producer/consumer), along with any information to clarify the perspective of the interpretation as one or the other or both, if needed

gCube Adoption Status

Information about status of adoption of Specification within Our system. Whether the specification has already been integrated and supported within the system, or it is under implementation, or soon to be implemented.

Components affected / relevant

  • component X: role

References

Tentative Compliance

Add here specifications that are not there, neither the project commits yet into supporting them, along with the need and relevance.

  • LDAP: Support integration of infrastructure structure with other systems (e.g. harvesting external infrastructure resources, or publishing D4Science infrastructure resources ).
  • WSDM: Provide standard's compliant web API for infrastructure management.