GCube Reference Architecture

From Gcube Wiki
Jump to: navigation, search


A Reference Architecture is an architectural design pattern that indicates how an abstract set of mechanisms and relationships realises a predetermined set of requirements.

The gCube system captured by the Reference Architecture in Figure 1 is the Software System resulting by combining in a Service Oriented Architecture a number of subsystems[1].

Figure 1. D4Science System Reference Architecture

Such subsystems are organised in a three-tier architecture consisting of:

  • the gCube run-time environment, named gCube Hosting Node environment or simply gHN, is the set of subsystems equipping each gCube empowered machine and forming the platform for the hosting and operation of the rest of system constituents. Namely, it consists of:
    • the gCube Container[2] (to run gCube Services),
    • the gCore Framework, named gCF (to reinforce the gCube Container in supporting the operation of gCube Services),
    • a number of local services, namely Deployer, gHN Manager, ResultSet, and Delegation if secure conversation is enabled,
    • a number of libraries and stubs needed to manage the communication with all other gCube services.
  • the gCube Infrastructure Enabling Services is the set of subsystems constituting the backbone of the gCube system and responsible to implement (i) the operation of an e-Infrastructure supporting resources sharing and (ii) the definition and operation of Virtual Research Environments;
  • the gCube Application Services is the set of subsystems implementing facilities for (i) storage, organisation, description and annotation of information in a VRE (Information Organisation Services), (ii) retrieval of information in the context of a VRE (Information Retrieval Services) and (iii) provision of VO and VRE users with an interface for accessing such an e-Infrastructure.

The overall architecture has been designed following the Service Oriented Architecture principles:

  • the main constituents of each subsystem are expected to be loosely-coupled Web Services (actually WSRF services);
  • the constituents of the gCube-based e-Infrastructure will be discovered thanks to the gCore Based Information System subsystem that, as usual, becomes fundamental to guarantee the operation of the rest;
  • such loosely-coupled Services can be organised in workflows as to form compound services whose orchestration is guaranteed by the Process Management subsystem.

In the remainder of this section the constituents of the Reference Architecture are introduced starting from the lower layer.

The gCube Run-time Environment

It is worth noting in this reference architecture that the runtime environment is an integral part of the overall system because the management of the environment hosting the services and the management of the service lifetime is part of the gCube business logic. Thanks to the gHN capabilities, other gCube services can be dynamically deployed on remote gHNs to serve the needs of Virtual Research Environments. Figure 2 presents the gCube Hosting Node (gHN) Reference Architecture.

Figure 2. gHN Reference Architecture

The gCore Framework

The gCore Framework (gCF) is a Java framework for the development of high-quality gCube services and service clients. It provides an application framework that allows gCube services to abstract over functionality lower in the web services stack (WSRF, WS Notification, WS Addressing, etc.) and to build on top of advanced features for the management of state, scope, events, security, configuration, fault, service lifetime, and publication and discovery.


The gCube Infrastructure Enabling Services

The gCube Infrastructure Enabling Services is the family of subsystems implementing the foundational services that guarantee the operation of the e-Infrastructure. Such functions are organised in four main areas: (i) organisation and execution of Virtual Research Environments (VRE Management) by guaranteeing an optimal consumption of the available resources (Broker and Matchmaker); (ii) registration of the infrastructure constituents (gCore Based Information System); (iii) the authentication and authorization policy enforcement enabling the highly controlled sharing of infrastructure constituents (Virtual Organisation Management); and (iv) definition and orchestration of complex workflows (Process Management) by guaranteeing an optimal consumption of the available resources (Process Optimisation). In particular:

  • the VRE Management services are responsible for: (i) the definition of VREs and (ii) the dynamic deployment of VRE resources across the infrastructure. VREs definitions are specified through an appropriate and user-friendly interface that allows a VRE Designer to characterise the VRE from a conceptual point of view, e.g. specify the expected content, specify the expected functions. This high level specification is automatically transformed in an optimal deployment plan that identifies the concrete resources (e.g. services) needed to implement it. The plan is based on availability, QoS requirements, resource inter-dependencies, and VRE sharing policies, but also on monitoring of failures (resources are dynamically redeployed) and load (resources are dynamically replicated). Three distinguished services (Software Repository, Deployer, gCube Hosting Node Manager) support VRE definition and dynamic deployment by, respectively, collecting service implementations, deploying service implementations and their dependencies on gHN, and hosting such service implementations at selected nodes.
  • The Broker and Matchmaker service identifies the set of gHNs where to deploy a set of services. In particular, given (a) a set of packages to be deployed and (b) their requirements versus the environment and/or other services, it identifies the set of gHN to be used as target hosts for the deployment action.
  • the gCore Based Information System allows the publication of descriptive information about VRE resources, the discovery of VRE resources based on descriptive information, and the real-time monitoring of VRE resources based on subscription/notification mechanisms. Heavily relied upon all the functional layers of gCube, the gCore Based Information Service is a replicated service in which instances communicate in peer-to-peer fashion to maximize availability, response time, and fault tolerance.
  • the Virtual Organisation Management services are responsible for equipping gCube with a robust and flexible security framework for managing Virtual Organizations (VOs). gCube exploits the VO mechanism to enforce a trusted and controlled environment in each dynamically created VRE. Its main features include user and group management, authentication support, authorization definition, delegation, and enforcement of the security credential. These services rely on and integrate VOMS and Globus Security Infrastructure (GSI) to provide gCube with a security framework supporting various configurations. The actors of this framework are humans as well as services.
  • the Process Management and Process Optimisation services support the definition and execution of processes, i.e., workflows combining gCube services, external services and gLite jobs to deliver new functionalities (also known as programming in the large). In particular, these services provide the basic functionality for (i) creating processes either via a graphical process modelling tool or via a BPEL definition , (ii) reliably executing processes in a fully distributed and decentralized, thus highly scalable, way and (iii) optimizing processes both at build-time and at run-time. Process execution facility has been designed and implemented to take full advantages of the Grid, i.e. process steps are outsourced to the resources forming the e-Infrastructure and the process is executed in a distributed peer-to-peer modality. In particular, the Process Management service is able to use the gLite software, thus enabling gCube to run such processes on EGEE resources. A monitoring front-end allows to get information on individual process instances which are not materialized on a single host because of their distributed execution. This monitoring front-end allows administrators to follow the state of execution of a process instance online and also shows where the different parts are being executed.

The gCube Application Services

The gCube Application Services is the family of subsystems delivering three outstanding functions of any Virtual Research Environment: (i) storage, description and annotation of information in a VRE (gCube Information Organisation Services), (ii) retrieval of information in the context of a VRE (gCube Information Retrieval Services) and (iii) provision of VRE users with an interface for accessing such an information and the rest of functions equipping a VRE (gCube Presentation Services).

The gCube Information Organisation Services

The gCube Information Organisation Services is the family of subsystems implementing the foundational services guaranteeing the management (storage, organisation, description and annotation) of information by implementing the notion of Information Objects, i.e. logical unit of information potentially consisting of and linked to other Information Objects as to form compound objects. Such functions are organised in three main areas: (i) the storage and organisation of Information Objects and their constituents (Content and Storage Management); (ii) the management of the metadata objects equipping each Information Object (Metadata Management); and (iii) the management of the annotations objects potentially enriching each Information Objects (Annotation Management). In particular:

  • the Content & Storage Management services provide transparent access to Information Objects managed through gCube. In particular, they provide basic functionality for: (i) manipulating Information Objects and/or collections, i.e. creating, accessing, storing, and removing; (ii) orchestrating distributed storage nodes and providing a transparent access to them; (iii) a notification mechanism to maintain derived data upon changes in content; and (iv) importing Information Objects from different content providers through wrappers. The kind of Information Object manageable by the Content & Storage Management services is generic and flexible enough to model and thus support several content types. To make full exploitation of Grid storage facilities, the Storage Management service provides an abstract interface to the underlying distributed and heterogeneous actual storage interfaces and technologies (e.g. DPM via SRM and the GFAL interface ).Thanks to the gCube replication management subsystem and by integrating gLite, the gCube Storage Management service is capable to exploit the storage capacity of the EGEE infrastructure and maintain multiple copies of the Information Objects as to maximise IOs availability.
  • the Metadata Management services provide functionality for managing metadata objects, i.e. additional data attached to Information Objects. In particular, these services support (i) the manipulation of metadata objects and metadata collections, i.e. creating, accessing, storing, and removing metadata objects compliant to one or more metadata format, (ii) the definition of metadata formats, (iii) the transformation of metadata objects into diverse formats via user-defined transformation programs, and (iv) the search for metadata objects. These characteristics make the services capable to manage multiple formats of metadata. Moreover, the support for diverse metadata formats and the relative transformation programs are an important feature for dealing with heterogeneity issues. To store metadata objects the services exploit the storage facilities provided by the Content & Storage Management and thus guarantee improvement in reliability and access of managed objects.
  • the Annotation Management services are responsible for cross-model, and cross-media back-end management of annotations, those are manually authored and subjective specialisation of metadata objects. The services mediate between interactive annotation front-ends and Metadata Management services by: (i) enforcing a consistent modelling of annotation relationships between Information Objects, and (ii) increasing the simplicity, granularity, and flexibility with which annotations are created, collected, deleted, updated, and interrelated as specific forms of metadata objects.

The gCube Information Retrieval Services

The gCube Information Retrieval Services is the family of components offering Information Retrieval (IR) facilities to the gCube infrastructure, i.e. allowing searching over data and information by a wide range of techniques. The IR family of services can be decomposed in three major categories, which are presented below and are entitled as “frameworks” due to the fact that they are not standalone services. Instead, they are rather large collaborating systems based on protocols, specifications and software, which expose remarkable extensibility to the gCube system they empower:

  • Search Framework: This category includes all services focused on the search-specific aspects of the gCube platform. More analytically, it consists of the search orchestrator component, search operators, query processor components and the data transfer mechanism. The workflow required for computing a user-query is the following: the search orchestrator receives queries from the gCube portal, communicates with the gCore IS service for retrieving environment information. In the next step, the orchestrator feeds this information along with the query to the query processor components which ultimately produce an execution plan. This plan is forwarded to the gCube execution engine (one of which is the Process Management Service) which orchestrates the execution by invoking the search operators, as dictated by the plan. The data transfer is performed by the ResultSet component of the Search Framework. However, due to its importance, it requires special credit and therefore is analyzed in a distinct section. The final results are then forwarded to the user (portal). TThe Search Operators realise most of the traditional relational algebra operations as well as some advanced ones, like those required by geospatial search and similarity search, thus providing a full fledged set of capabilities to the final user. Index Management and DIR frameworks provide a major part of the Search Operators and are analyzed in distinct sections.
  • Index Management Framework: This category includes all services that are involved in the creation and management of gCube indices. Management refers to all aspects of an index lifecycle as well as support for search capabilities. In gCube a rich set of indices, such as full text, forward, feature, geospatial indices, is employed, offering a full-fledged set of storage and search capabilities regarding various data types and models. The services of Index Management Framework communicate with the Content and Storage Management services in order to acquire the data set to be indexed and also to preserve their state. They also employ the gCore IS capabilities so as to publish themselves and therefore be used by clients.
  • Distributed Information Retrieval Support Framework: This category includes all services which enhance and support the IR system. This framework provides higher-level IR capabilities which include content ranking, source selection and result set fusion (ranked merging of various data sets). Components of this framework communicate with the Index Management Services for statistic extraction and the IS service for information publication. Search Framework employs the advanced capabilities offered by DIR framework in order to enhance its search capabilities, by refining queries, enhancing produced search results and finally exhibiting a higher level of services.
  • An additional component which does not belong to any of the frameworks mentioned above, but acts independently and improves the search quality, is the Personalisation Service. It is indirectly invoked by the Search Framework, through an appropriate wrapper, and used for enhancing user queries, by injecting additional “personalized” information.

The gCube Presentation Services

The gCube Presentation services form the logical top layer of a gCube-powered infrastructure. Their objective is twofold:

  • To provide the means to build user interfaces for interacting with and exploiting the gCube system and infrastructure.
  • To provide a full range of user interfaces for achieving interaction with the system, out-of-the-box.

The gCube presentation layer is based on the Application Support Layer (ASL), which is a framework that abstracts the complexity of the underlying infrastructure so that the front-end developer focuses on the objectives of presentation rather the details of the protocols and rules for interacting with the underlying (WSRF) services. The ASL exposes to the developer well known tools as session and credential management and is accessible through various interfaces (currently HTTP and JAVA-native).

On top of the ASL the developer can develop the user interface components needed for a particular application, depending on the execution environment that will host them (e.g. php web server, desktop application, application server etc).

The execution environment is normally provided by existing systems and can be powered by bare Operating Systems / Virtual Machines (e.g. desktop applications), plain html pages, dynamic web-sites (php, asp, jsp etc), portals, application servers etc.

gCube presentation layer, offers an initial set of components currently running under the JRS168 specification, hosted by GridSphere portlet container , while, apart from gCube core services, it is based on Java and servlets technologies for offering it services.


  1. A subsystem is intended as a logical constituent unit when considered with respect to the system as a whole. A subsystem groups Services, Libraries and any other kind of software component belonging to the area of competence of the subsystem.
  2. A customisation of the WS Core Container distributed by the Globus® Alliance.