GCore Based Information System

From Gcube Wiki
Jump to: navigation, search

The gCube Information System (shortly, IS) delivers functionalities for publishing, discovering, and monitoring the set of resources forming the infrastructure. It acts as the registry of the infrastructure, i.e. all the resources are registered in the IS and every service partaking in the infrastructure must refer to it to dynamically discover the other infrastructure constituents. Moreover, the approach provided by the IS is of great support for the dynamic deployment capabilities of gCube.

In this context, a resource can be:

  • a gCube resource, supporting the deployment and operation of a gCube infrastructure;
  • an instance state, characterizing the operational state of an instance of a gCube service
  • a generic resource, any XML well-formed document (a text that follows all the syntactic rules labelled as well-formedness rules in the XML specification)

Because of its central role, key requirements in terms of quality of service for such a subsystem are performance, scalability, freshness and availability. Moreover, facilities supporting the interaction with such subsystem have been included in the gCore Framework.

Reference Architecture

Architecturally, the IS is composed by a group of services and libraries enhancing the experience of potential clients. The central role is played by the InformationCollector (IC) service, in charge of collecting and storing information about the infrastructure (or a subset) and responding to those that call for discovering. There are two ways to feed the IC, depending on the nature of the information published. If the information is a gCube Resource profile, a request for publication must be sent to the Registry service. This service is devoted to validate and filter profiles in order to decide whether a resource is accepted or not as part of the infrastructure (other gCube services are in charge of regulating the access to the accepted resources). On the other hand, if the information to publish is an instance state or a generic resource, it does not need to pass through the Registry service's acceptance procedure and can be directly sent to the IC.

The third service belonging the IS is the Notifier, offering a mechanism for subscription/notification on events related to gCube Resource's lifetime. By relying on the WS-Notification and in cooperation with the Registry service, this service sends notifications to subscribed consumers about events happening in the Registry service (such as the registration of a new resource).

All of the three services have a related client library abstracting over the details of the services' interface:

  • IS-Client: for interacting with the IC service for discovering
  • IS-Publisher: for interacting with the IC and Registry services for publication
  • IS-Notification: for becoming a consumer of gCube's notification events sent by the Notifier

Finally, the Information System subsystem is equipped with an optional service named gLiteBridge. Its role is to foster the interoperability with gLite-based infrastructures by publishing in the IS computing elements, storage elements and sites harvested from their information systems (mainly BDII).

Figure 1 presents the components of the Information System and their main interactions:

Figure 1. Information System Architecture and Main Interactions

They globally deliver the following functionalities with respect to the information handled:

  • production and publication
  • collection and storage
  • discovery and consumption

The Information System supports two deployment scenarios: Standard Configuration and Advanced Configuration

Standard Configuration

It does support the new Featherweight Client Stack, born to better support clients in interacting with web services. It currently does not yet provide support for subscription and notification.

Server Side

  • IS-InformationCollector – gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
  • IS-Registry – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;
  • IS-gLiteBridge – Optional - gCube Web Service: support the publishing/un-publishing of resources gathered from a gLite based infrastructure that gCube services may access to;

Client Side

Advanced Configuration

It does provide support for subscription and notification. However, it imposes constraints on client side.

Server Side

  • IS-InformationCollector – gCube Web Service: collect, store, and make available information related to the actual state of a gCube infrastructure and/or of an assigned subset of it;
  • IS-Registry – gCube Web Service: support the publishing/un-publishing of profiles describing gCube resources;
  • IS-gLiteBridge – Optional - gCube Web Service: support the publishing/un-publishing of resources gathered from a gLite based infrastructure that gCube services may access to;
  • IS-Notifier – gCube Web Service: support other services in subscribing/unsubscribing to topics produced by the various Services; this service decouples the actual producer of the topic from the actual consumer allowing for producers re-location;

Client Side

  • IS-Publisher – gCube Library: support services in publishing/un-publishing information in the Information Collector service. It's the gateway for any information going to the IS;
  • IS-Client – gCube Library: support services in discovering information published in the IS;
  • IS-Notification – gCube Library: provide a publication/subscription/notification mechanism for Topics produced and consumed by services.
  • IS-Cache - gCube Library: provide caching functionality for the information published in the IS;

Design Notes

The IS has been conceived to rely on standards, most noticeably:

Early versions mostly exploited WS-ServiceGroup and WS-ResourceProperty specifications. Starting from version 2.0 (released in Feb 2011), the IS is designed around the WS-DAIX specification for publishing. WS-Notifications is at the heart of the functionalities delivered by the IS-Notifier service. Finally, the queries accepted by the IS has to be compliant with the XQuery language.

Worthy to mention, during the design of the IS, the following principle has been widely adopted: program to an interface, not an implementation. This means that we tried to maintain the IS consumers and producers as much as possible decoupled from its implementation. More concretely, a gCube service has to know only the IS-Client, IS-Notifier and IS-Publisher interfaces and that's all. It does not need to care about their implementation (mechanisms to dynamically load the IS-Client, IS-Notifier and IS-Publisher at runtime have been put in place) nor the actual IS deployment scenario (completely abstracted by the IS client libraries).

QoS

All the design aspects of the IS have been tackled taking into account the fact that if the IS does not work or works slowly or offers a poor service, all the infrastructure follows. The chain of operations involving the discovery phase is carefully designed and implemented to reduce the waiting time of callers. The IC service works in a stateless manner in this part, by only executing the query against the underlying XML indexing system. Also the SOAP messages sends and received are the simplest possible in order to reduce the marshaling and unmarshaling computation time. Yet, to do not overlap with the discovery phase, the publications work in a bulky way to reduce the incoming calls to the IC and do not compete with the invocations for queries. The IS-Publisher collects and queues requests for publication and sends them to the Registry and then to the IC by cutting as much as possible the number of competing calls. Form the deployment point of view, IS services can be distributed and partially replicated in a gCube infrastructure to manage subsets of resources (usually belonging to different scopes). Different scenarios can be set up in order to meet the performance and scalability requirements according to the extent of the infrastructure itself (e.g. how many resources to be managed, how many nodes are available, and so on).