Developer's Guide: Introduction

From Gcube Wiki
Revision as of 18:23, 29 July 2008 by Federico.defaveri (Talk | contribs) (The gCube Infrastructure Enabling Services)

Jump to: navigation, search

Alert icon2.gif THIS SECTION OF GCUBE DOCUMENTATION IS CURRENTLY UNDER UPDATE.

Overview

Welcome to the gCube's Developers Guide. The purpose of this document is to provide instructions for developers wishing to exploit a gCube based grid infrastructure. gCube is a versatile, rich featured grid platform that has been developed in the context of the D4SCIENCE European IST research project [1].

The platform follows the Service Oriented paradigm and exploits and extends various existing grid middlewares and collaborative tools like the Globus Toolkit 4 [2], gLite [3], the GridSphere Portal Framework [4], etc. gCube offers a feature full platform for distributed hosting, management and retrieval of data and information, and a framework for extending state-of-the-art indexing, selection, fusion, extraction, description, annotation, transformation, and presentation of content.

gCube Architecture

A Reference Architecture is an architectural design pattern that indicates how an abstract set of mechanisms and relationships realises a predetermined set of requirements .

The gCube system captured by the Reference Architecture in Figure 1 is the Software System resulting by combining in a Service Oriented Architecture a number of subsystems.

Figure 1. D4Science System Reference Architecture

Such subsystems are organised in a three-tier architecture consisting of:

  • the gCube run-time environment, named gCube Hosting Node environment or simply gHN, is the set of subsystems equipping each gCube empowered machine and forming the platform for the hosting and operation of the rest of system constituents. Namely, it consists of the gCube Container (to run gCube Services), the gCore Framework, named gCF (to reinforce the gCube Container in supporting the operation of gCube Services), a number of local services, namely Deployer, gHNManager, and Delegation, and a number of libraries and stubs needed to manage the communication with all other gCube services.
  • the gCube Infrastructure Enabling Services is the set of subsystems constituting the backbone of the gCube system and responsible to implement (i) the operation of an e-Infrastructure supporting resources sharing and (ii) the definition and operation of Virtual Research Environments;
  • the gCube Application Services is the set of subsystems implementing facilities for (i) storage, organisation, description and annotation of information in a VRE (Information Organisation Services), (ii) retrieval of information in the context of a VRE (Information Retrieval Services) and (iii) provision of VO and VRE users with an interface for accessing such an e-Infrastructure.

The overall architecture has been designed following the Service Oriented Architecture principles:

  • the main constituents of each subsystem are expected to be loosely-coupled Web Services (actually WSRF services );
  • the constituents of the gCube-based e-Infrastructure will be discovered thanks to the Information System subsystem that, as usual, becomes fundamental to guarantee the operation of the rest;
  • such loosely-coupled Services can organised in workflows as to form compound services whose orchestration is guaranteed by the Process Management subsystem.

It is worth noting in this reference architecture that the runtime environment is an integral part of the overall system because the management of the environment hosting the services and the management of the service lifetime is part of the gCube business logic. Thanks to the gHN capabilities, other gCube services can be dynamically deployed on remotely gHNs to serve the needs of Virtual Research Environments. Figure 2 presents the gCube Hosting Node (gHN) Reference Architecture.

Figure 2. gHN Reference Architecture

In the remainder of this section the constituents of the Reference Architecture are introduced starting from the lower layer.

The gCore Framework

The gCore Framework (gCF) is a Java framework for the development of high-quality gCube services and service clients. It provides an application framework that allows gCube services to abstract over functionality lower in the web services stack (WSRF, WS Notification, WS Addressing, etc.) and to build on top of advanced features for the management of state, scope, events, security, configuration, fault, service lifetime, and publication and discovery.

The gCube Infrastructure Enabling Services

The gCube Infrastructure Enabling Services is the family of subsystems implementing the foundational services that guarante the operation of the e-Infrastructure. Such functions are organised in four main areas: (i) organisation and execution of Virtual Research Environments (VRE Management) by guaranteeing an optimal consumption of the available resources (Broker and Matchmaker); (ii) registration of the infrastructure constituents (Information Service); (iii) the authentication and authorization policy enforcement enabling the highly controlled sharing of infrastructure constituents (Virtual Organisation Management); and (iv) definition and orchestration of complex workflows (Process Management) by guaranteeing an optimal consumption of the available resources (Process Optimisation). In particular:

  • the VRE Management services are responsible for: (i) the definition of VREs and (ii) the dynamic deployment of VRE resources across the infrastructure. VREs definitions are declaratively specified through an appropriate and user-friendly user interface in a dedicated language and inform the derivation of an optimal deployment plan. The plan is based on availability, QoS requirements, resource inter-dependencies, and VRE sharing policies, but also on monitoring of failures (resources are dynamically redeployed) and load (resources are dynamically replicated). Three distinguished services (Software Repository, Deployer, gCube Hosting Node Manager) support VRE definition and dynamic deployment by, respectively, collecting service implementations, deploying service implementations and their dependencies on gHN, and hosting such service implementations at selected nodes.
  • The Broker and Matchmaker service identifies the set of gHNs where to deploy a set of services. In particular, given a set of packages to be deployed, their requirements versus the environment and/or other services, it identifies the set of gHN to be used as target hosts for the deployment action.
  • the Information Service allows the publication of descriptive information about VRE resources, the discovery of VRE resources based on descriptive information, and the real-time monitoring of VRE resources based on subscription/notification mechanisms. Heavily relied upon all the functional layers of gCube, the Information Service is a replicated service in which instances communicate in peer-to-peer fashion to maximize availability, response time, and fault tolerance.
  • the Virtual Organisation Management services are responsible for equipping gCube with a robust and flexible security framework for managing Virtual Organizations (VOs). gCube exploits the VO mechanism to enforce a trusted and controlled environment in each dynamically created VRE. The main features consist in user and group management, authentication support, authorization definition, delegation, and enforcement of the security credential. These services rely on and integrate VOMS and Globus Security Infrastructure (GSI) to provide gCube with a security framework supporting various configurations. The actors of this framework are humans as well as services.
  • the Process Management and Process Optimization services support the definition and execution of processes, i.e., workflows combining gCube services, external services and gLite jobs to deliver new functionalities (also known as programming in the large ). In particular, these services provide the basic functionality for (i) creating processes either via a graphical process modelling tool or via a BPEL definition , (ii) reliably executing processes in a fully distributed and decentralized, thus highly scalable, way and (iii) optimizing processes both at build-time and at run-time. Process execution facility has been designed and implemented to take full advantages of the Grid, i.e. process steps are outsourced to the resources forming the e-Infrastructure and the process is executed in a distributed peer-to-peer modality. In particular, the Process Management service integrates the gLite software, thus enabling gCube to run such processes on EGEE resources. A monitoring front-end allows to get information on individual process instances which are not materialized on a single host because of their distributed execution. This monitoring front-end allows administrators to follow the state of execution of a process instance online and also shows where the different parts are being executed.

The gCube Application Services

The gCube Application Services is the family of subsystems delivering three outstanding functions of any Virtual Research Environment: (i) storage, description and annotation of information in a VRE (gCube Information Organisation Services), (ii) retrieval of information in the context of a VRE (gCube Information Retrieval Services) and (iii) provision of VRE users with an interface for accessing such an information and the rest of functions equipping a VRE (gCube Presentation Services).

The gCube Information Organisation Services

The gCube Information Organisation Services is the family of subsystems implementing the foundational services guaranteeing the management (storage, organisation, description and annotation) of information by implementing the notion of Information Objects (cf. Section ), i.e. logical unit of information potentially consisting of and linked to other Information Objects as to form compound objects. Such functions are organised in three main areas: (i) the storage and organisation of Information Objects and their constituents (Content and Storage Management); (ii) the management of the metadata objects equipping each Information Object (Metadata Management); and (iii) the management of the annotations objects potentially enriching each Information Objects (Annotation Management). In particular:

  • the Content & Storage Management services provide transparent access to Information Objects managed through gCube. In particular, they provide basic functionality for: (i) manipulating information objects and/or collections, i.e. creating, accessing, storing, and removing; (ii) orchestrating distributed storage nodes and providing a transparent access to them; (iii) a notification mechanism to maintain derived data upon changes in content; and (iv) importing information objects from different content providers through wrappers. The kind of Information Object manageable by the Content & Storage Management services is generic and flexible enough to model and thus support plethora of content types. To make full exploitation of Grid storage facilities, the Storage Management service provides an abstract interface to the underlying distributed and heterogeneous actual storage interfaces and technologies (e.g. DPM via SRM and the GFAL interface ).Thanks to the gCube replication management subsystem and by integrating gLite, the gCube Storage Management service is capable to exploit the storage capacity of the EGEE infrastructure and maintain multiple copies of the Information Objects as to maximise IOs availability.
  • the Metadata Management services provide functionality for managing metadata objects, i.e. additional data attached to Information Objects. In particular, these services support (i) the manipulation of metadata objects and metadata collections, i.e. creating, accessing, storing, and removing metadata objects compliant to one or more metadata format, (ii) the definition of metadata formats, (iii) the transformation of metadata objects into diverse formats via user-defined transformation programs, and (iv) the search for metadata objects. These characteristics make the services capable to manage multiple formats of metadata. Moreover, the support for diverse metadata formats and the relative transformation programs are an important feature for dealing with heterogeneity issues. To store metadata objects the services exploit the storage facilities provided by the Content & Storage Management and thus guarantee improvement in reliability and access of managed objects.
  • the Annotation Management services are responsible for cross-model, and cross-media back-end management of annotations, a manually authored and subjective specialisation of metadata objects. The services mediate between interactive annotation front-ends and Metadata Management services by: (i) enforcing a consistent modelling of annotation relationships between information objects, and (ii) increasing the simplicity, granularity, and flexibility with which annotations are created, collected, deleted, updated, and interrelated as specific forms of metadata objects.

The gCube Information Retrieval Services

The gCube Information Retrieval Services is the family of components offering Information Retrieval (IR) facilities to the gCube infrastructure, i.e. allowing searching over data and information via a different set of techniques. The IR family of services can be decomposed in three major categories, which are presented below and are entitled as “frameworks” due to the fact that they are not standalone services. Instead, they are rather large collaborating systems based on protocols, specifications and software, which expose unparalleled extensibility to the gCube system they empower:

  • Search Framework: This category includes all services focused on the search-specific aspects of the gCube platform. More analytically, it consists of the search orchestrator component, search operators, query processor components and the data transfer mechanism. The workflow required for computing a user-query is the following: The search orchestrator receives queries from the gCube portal, communicates with the gCore IS service for retrieving environment information. In the next step, the orchestrator feeds this information along with the user query to the query processor components which ultimately produce an execution plan. This plan is forwarded to the gCube execution engine (one of which is the Process Management Service) which orchestrates the execution by invoking the search operators, as dictated by the plan. The data transfer is performed by the ResultSet component of the Search Framework. The final results are then forwarded to the user (portal). The Search Operators cover most of the traditional relational algebra operations, as well as some advanced ones, such as geospatial search and similarity search, thus providing a full fledged set of capabilities to the final user. Index Management and DIR frameworks provide a major part of the Search Operators and are analyzed in distinct sections.
  • Index Management Framework: This category includes all services that are involved in the creation and management of gCube indices. Management refers to all aspects of an index lifecycle as well as support for search capabilities. In gCube a rich set of indices, such as full text, forward, feature, geospatial indices, offering a full-fledged set of storage and search capabilities regarding various data types and models are available. The services of Index Management Framework communicate with the Content And Storage Management services in order to acquire the data set to be indexed and also to preserve their state. They also employ the gCore IS capabilities so as to publish themselves and therefore be used by clients.
  • Distributed Information Retrieval Support Framework: This category includes all services which enhance and support the IR system. This framework provides higher-level IR capabilities which include content ranking, source selection, and result set fusion (ranked merging of various data sets). Components of this framework communicate with the Index Management Services for statistic extraction and the IS service for information publication. Search Framework employs the advanced capabilities offered by DIR framework in order to enhance its search capabilities, by refining queries, enhancing produced search results and finally exhibiting a higher level of services.
  • An additional component which does not belong to any of the frameworks mentioned above, but acts independently and improves the search quality, is the Personalisation Service. It is indirectly invoked by the Search Framework, through an appropriate wrapper, and used for enhancing user queries, by injecting additional “personalized” information.


The gCube Presentation Services

The gCube Presentation services form the logical top layer of a gCube-powered infrastructure. Their objective is twofold:

  • To provide the means to build user interfaces for interacting with and exploiting the gCube system and infrastructure.
  • To provide a full range of user interfaces for achieving interaction with the system, out-of-the-box.

The gCube presentation layer is based on the Application Support Layer (ASL), which is a framework that abstracts the complexity of the underlying infrastructure so that the front-end developer focuses on the objectives of presentation rather the details of the protocols and rules for interacting with the underlying (WSRF) services. The ASL exposes to the developer well known tools as session and credential management and is accessible through various interfaces (currently HTTP and JAVA-native).

On top of the ASL the developer can develop the user interface components needed for a particular application, depending on the execution environment that will host them (e.g. php web server, desktop application, application server etc).

The execution environment is normally provided by existing systems and can be powered by bare Operating Systems / Virtual Machines (e.g. desktop applications), plain html pages, dynamic web-sites (php, asp, jsp etc), portals, application servers etc.

gCube presentation layer, offers an initial set of components currently running under the JRS168 specification, hosted by GridSphere portlet container, while, apart from gCube core services, it is based on Java and servlets technologies for offering it services.

Intended Readership

The document targets two classes of programmers:

  • Those who want to reuse the code – Programmers who will use gCube’s libraries to build their own services and middleware components, without need to access the source code.
  • Those who want to modify/extend the source code – Programmers who will use the platforms source code to enhance it, correct it, adapt it to different environments and applications domains.

Related Documents

Apart from this Developers Guide, D4SCIENCE has also made available two additional support documents:

  • the User's Guide, which provides usage information and guidelines for the end-user of the two user communities that currently exploit the platform, namely ImpECt and ARTE.
  • the Administrator's Guide, which provides information and guidelines for the installation, configuration and daily administration of a gCube based computational grid infrastructure.

Additional material that will help potential gCube developers is the

  • gLite 3.0 Manuals Series User Guide [5]
  • Globus Toolkit 4.0 Developer's Guide [6]

Regarding the architecture and inner details of gCube, the interested reader can visit the official gCube platform web site [7].

Lexical Abbreviations

The following abbreviations are used extensively throughout the document:

ABE Annotation Back End
ADL Advanced Distributed Learning
AFE Annotation Front End
AIS Archive Import Service
API Application Programming Interface
ASL Application Support Layer
BMBMM Broker and Matchmaker
BPEL Business Process Execution Language
CE Computing Element
CMS Content Management Service
CORI Collection Retrieval Inference Network
CS Compound Service
D4Science DIstributed colLaboratories Infrastructure on Grid Enabled Technology 4 Science
DIR Distributed Information Retrieval
DPM Disk Pool Manager
DTS Data Transformation Service
EC European Commission
EPR EndPoint Reference
gCF gCube Core Framework
GFAL Grid File Access Library
gHN gCube
GPL General Public Licence
GUI Graphical User Interface
GUID Global Unique Identifier
GWT Google Web Toolkit
IO Input / Output
IR Information Retrieval
IS Information Service
JDBM Java DataBase Manager
JSP Java Server Pages
JSR Java Specification Request
LGPL Lesser General Public Licence
LMS Learning Management System
MD5 Message-Digest algorithm 5
OASIS Organization for the Advancement of Structured Information Standards
PES Process Execution Service
POS Process Optimisation Service
SCORM Shareable Content Object Reference Model
SE Storage Element
SRM Storage Resource Manager
URI Uniform Resource Identifier
URL Uniform Resource Locator
VO Virtual Organisation
VOMS Virtual Organisation Membership Service
VRE Virtual Research Environment
WS Web Service
WSRF Web Service Resource Framework
XENA eXecution ENgine API
XSL XML Stylesheet Language
XSLT XSL Transformations


Problem Reporting

For problem reporting or any other enquiries regarding this document please contact Vangelis Floros (florosAt symbol.gifdi.uoa.gr).