GCube Web Specifications and Standards Compliance
This area collects the Standard Specifications supported by the gCube system APIs, as part of the WP11 activities and towards meeting the integration and interoperability objectives for promoting the openness of the e-Infrastructure to other neighbouring and external ones. The collection focuses on the widely used, HTTP-based Specifications and generic interchange protocols (data/content standards, metadata standards, Web interface standards, security standards, data sharing protocols, data transfer protocols) that service both disseminating and consuming system's needs. This analysis is conducted per functional category and addresses the use, need and relevance of the standards that fall under each gCube functional area.
Contents
Table of Protocols
Specification label | Functional area | Direction | Adoption Status |
OAI-PMH (Producer) | Data Consumption | Producer | Completed |
OAI-ORE (Producer) | Data Consumption | Producer | Completed |
OpenSearch (Consumer) | Data Consumption | Consumer | Completed |
OpenSearch (Producer) | Data Consumption | Producer | Completed |
SRU (Consumer) | Data Consumption | Consumer | Planned |
FTP/FTPS/SFTP (Consumer) | Data Transfer | Consumer | Ongoing |
HTTP/HTTS (Consumer) | Data Transfer | Consumer | Ongoing |
- Functional areas: Data Consumption / Data Production / Computation Consumption / Infrastructure Management / Data Transfer
- Direction: Producer / Consumer
- Adoption Status: Completed / On going / Planned
OAI-PMH
Specification Description
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a well-established standard in the content management and library science worlds that is gaining in importance. It provides a low-barrier mechanism for repository interoperability and defines the following parties and software components:
- Data Providers are repositories that expose structured metadata via OAI-PMH. A 'Data Provider' such as an academic library runs a Repository that supports OAI-PMH as a means of exposing metadata information about resources, for instance academic publications.
- Service Providers then make OAI-PMH service requests to harvest that metadata. A 'Service Provider' uses Harvester software to harvest metadata from Data Providers. The harvested metadata can then be used to provide valued-added services, such as a website that allows browsing and searching through their catalog.
OAI-PMH is a set of six verbs or services that are invoked within HTTP. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations.
gCube Use/Need/Relevance
Through OAI-PMH protocol, gCube infrastructure acts as a 'Data Provider' and disseminates the hosted metadata records in a standard fashion, thus allowing for interoperation with other data e-Infrastructures that run autonomously. Other infrasturctures can harvest the metadata descriptions of gCube content in archives so that their services can exploit the collections. The protocol provides an application-independent interoperability framework for metadata exchange between the online parties.
Functional Category
Data Consumption
Direction
- Producer
gCube Adoption Status
The protocol has already been integrated in the gCube system, from the 'Data Provider' perspective. The description of the adopted methodology towards the integration is described here.
Related gCube components
- aslHttp OAI_PMH: the http front end for the protocol
- applicationSupportLayer_OAI_PMH: business logic back-end component for the protocol
References
OAI-ORE
Specification Description
Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information.
In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. This practice of aggregating extends to the Web. We accumulate URL's in bookmarks or favorites lists in our browser, collect photos into sets in popular sites, browse over multiple page documents that are linked together through "prev" and "next" tags, and talk about Web sites as if they had some real existence beyond the set of pages of which they consist. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a collection of images to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.
gCube Use/Need/Relevance
gCube Content Model aims to provide high-level functionality for manipualtion of content over the Grid-based environments. Content in gCube is stored and organized following a graph-based data model, the Information Object Model, that allows finer control of content, by incorporating the possibility to annotate content with arbitrary properties an to relate different content unities via arbitrary relationships.
Starting from this model a document model has been built, in which complex documents, composed of various, eventually nested subparts, are represented as chains of Information Objects linked via appropriate relationships. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). In this respect, gCube documents are managed as compound objects comprising metadata, annotations, alternative representations and multiple parts. The notion of gCube documents is implemented and mangaged by the gCube Information Organisation Services family of subsystems that include storage services, access services, plugins and a number of distinguished clients that can be internal or external to the system.
The aggregated information that constructs a gCube document can be transfered through the solution provided by OAI-ORE, without the need for clients to rely on the API's of the individual system architectures and their definition of document boundaries. The gCube ORE Provider allows the dissemination of the digital objects stored in gCube repository as OAI-ORE Resource Maps.
Functional Category
Data Consumption
Direction
- Producer
gCube Adoption Status
The protocol has been recently integrated in the gCube System, from the producer perspective. The description of the adopted methodology towards the adoption and implementation is described here.
Components affected / relevant
- aslHttp ORE_Provider: the http front end for the protocol
- applicationSupportLayer_OAI_ORE: business logic back-end component for the protocol
References
OpenSearch
Specification Description
OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. OpenSearch helps search engines and search clients communicate by introducing a common set of formats to perform search requests and syndicate search results. The five basic pieces of information that a search client needs in order to communicate effectively with a search engine supporting the protocol are:
- local search engine location
- the query-grammar expected
- the request encoding
- the response encoding
- the record encoding
The OpenSearch protocols define what a description document looks like, but not how it is retrieved. The location of the description document is discovered by some means outside the protocol (a priori knowledge). The description document specifies the location of the local search engine, how to formulate the search URL, and the local search engine's language to which the queries submitted must comply.
gCube Use/Need/Relevance
As a producer, gCube can publish search results in a standard and accessible format, thus allowing metasesarch engines to know how to send a search to the gCube search engine and how to interpret the results. As a consumer, gCube can access external providers which publish their results through search engines conforming to the OpenSearch Specification. Therefore, it can act as a metasearch engine that combines results coming from a gCube search with the results from searching many other sites simultaneously, providing a high level of integration.
Functional Category
Data Consumption
Direction
- Producer: OpenSearch interface over gCube search
- Consumer: OpenSearch Framework accessing external OpenSearch providers
gCube Adoption Status
The protocol has been recently integrated in the gCube System, both from the producer and the consumer perspective. The description of the adopted methodology towards the adoption and implementation in the side of the producer, is described here. The consumer side functionality of the gCube OpenSearch implementation is concentrated in the OpenSearch framework, whose description and features analysis can be found here
Components affected / relevant
- aslHttp Information Retrieval - OpenSearch: the http front end for the protocol that allows the gCube search engine to act as an OpenSearch Provider
- OpenSearch Library: framework that includes a core library providing general-purpose OpenSearch functionality, and the OpenSearch Operator which utilizes functionality provided by the former
- OpenSearch Service: the web service responsible for the invocation of the OpenSearch Operator in the context of the provider to be queried
References
SRU
Specification Description
SRU is a standard XML-focused search protocol for Internet search queries, utilizing CQL (Contextual Query Language), a standard syntax for representing queries. As in OpenSearch, the five basic pieces of information provided by the mechanisms of the protocol to a client trying to communicate with the search engine, are: (1) local search engine location, (2) the query grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record endoding. SRU expects that the content provider will have a description record that describes the search service. The protocols define what a description record looks like and specifies that it can be obtaines from the local search engine. The location of the local search engine is provided by means outside the protocol (a priori knowledge). SRU defines also how to formulate the search URL by defining it, and specifies a standard query grammarL CQ: (Common Query Language). This means that clients of the engine only have to write one translator for all the SRU local search engines but also that all SRU local search engines have to support the CQL query grammar.
gCube Use/Need/Relevance
The gCube Search engine supports CQL as its native query grammar, thus complying fully to the SRU requirements for query formulation. Providing and interface and the description mechanisms defined by the protocol would allow all SRU metasearch engines to access gCube results in a standard way and with a high integration level.
In the consuming side of the protocol, gCube would act as a metasearch engine for external search engines. An SRU provider integrated in the gCube system , would allow the dissemination of its results coming along with gCube results, within a single search in gCube system. The mechanism for the registration of the information that describes the SRU search engines in the gCube system, can be integrated with the one already implemented for OpenSearch providers, since the requirements for effective communication with external search engines are common in both protocols.
Functional Category
Data Consumption
Direction
- Producer: OpenSearch interface over gCube search
- Consumer: OpenSearch Framework accessing external OpenSearch providers
gCube Adoption Status
The protocol is planned to be integrated within the system, covering both consumer and producer sides and exploiting the already implemented mechanisms for OpenSearch providers subscription to the system.
Components affected / relevant
- aslHttp Information Retrieval - SRU: the http front end for the protocol that allows the gCube search engine to act as an SRU provider
- SRU - OpenSearch Convertor: the mechanism that will be converting the response of an 'explain' request in SRU protocol to the equivalent OpenSearch Description document, to register the provider information within the gCube System
- To be defined.
References
FTP/FTPS/SFTP
Specification Description
File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host or to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and uses separate control and data connections between the client and the server. FTP users may authenticate themselves using a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that hides (encrypts) the username and password, and encrypts the content, FTP is often secured with SSL/TLS FTP). SSH File Transfer Protocol SFT) is sometimes also used instead.
gCube Use/Need/Relevance
Describe the use/need/relevance of the specification in respect to the functional area of the system.
Functional Category
The functional category under which the services underlying the protocol fall.
Direction
The direction towards the system (Producer/consumer), along with any information to clarify the perspective of the interpretation as one or the other or both, if needed
gCube Adoption Status
Information about status of adoption of Specification within Our system. Whether the specification has already been integrated and supported within the system, or it is under implementation, or soon to be implemented.
Components affected / relevant
- component X: role
References
HTTP/HTTPS
Specification Description
Description and useful information about the Specification.
gCube Use/Need/Relevance
Describe the use/need/relevance of the specification in respect to the functional area of the system.
Functional Category
The functional category under which the services underlying the protocol fall.
Direction
The direction towards the system (Producer/consumer), along with any information to clarify the perspective of the interpretation as one or the other or both, if needed
gCube Adoption Status
Information about status of adoption of Specification within Our system. Whether the specification has already been integrated and supported within the system, or it is under implementation, or soon to be implemented.
Components affected / relevant
- component X: role
References
Protocol XX
Specification Description
Description and useful information about the Specification.
gCube Use/Need/Relevance
Describe the use/need/relevance of the specification in respect to the functional area of the system.
Functional Category
The functional category under which the services underlying the protocol fall.
Direction
The direction towards the system (Producer/consumer), along with any information to clarify the perspective of the interpretation as one or the other or both, if needed
gCube Adoption Status
Information about status of adoption of Specification within Our system. Whether the specification has already been integrated and supported within the system, or it is under implementation, or soon to be implemented.
Components affected / relevant
- component X: role
References
Tentative Compliance
Add here specifications that are not there, neither the project commits yet into supporting them, along with the need and relevance.
- LDAP: Support integration of infrastructure structure with other systems (e.g. harvesting external infrastructure resources, or publishing D4Science infrastructure resources ).
- WSDM: Provide standard's compliant web API for infrastructure management.