Integration and Interoperability Facilities Framework: Client Libraries Design Model

From Gcube Wiki
Jump to: navigation, search

gCube includes a large number of client libraries, components that mediate access to system service from within client Java runtimes. The design and implementation of client libraries vary along with the semantics of target services and the technology stacks that the libraries use to interact with the services. However, the system requires them to provide a common set of functional capabilities and to adopt uniform patterns for the design of their APIs.

Collectively, these capabilities and patterns define a design model for client libraries. The model focuses on functionality and high-level aspects of API design, though it defines a small number of classes and interfaces. It also raise strong requirements against library implementations. Extends support towards meeting these requirements is provided by the client library framework.

In this document, we present the model in the context of a hypothetical foo service and its client library.

Assumptions and Terminology

We work under the following assumptions and using the following terminology:

  • services: foo is an HTTP service, in that it uses HTTP at least as its transport protocol [1]. At the time of writing, all system services are more specifically WS RPC services, i.e. use SOAP over HTTP to invoke service-specific APIs. In the future, system services may also be REST services, in the broad sense of stateless services that use HTTP as their application protocol.
  • publication & discovery: deploying and starting foo at a given network address yields a foo endpoint [2]. Deployment may be static or, more commonly within the system, dynamic. In both cases, clients are neither coded nor configured with static knowledge of foo endpoints. Rather, they obtain them at runtime by sending queries to discovery services available within the system. Symmetrically, foo is coded or configured to send information about its endpoints to publishing services available within the system;
  • state: foo may be stateless, in that its endpoints may not maintain any form of state on behalf of clients. More commonly, however, foo is stateful, i.e. its endpoints encapsulate data which they use and change upon processing client requests. At the time of writing, no service within the system is conversational, i.e. maintains the state of ongoing sessions with individual clients. Rather, foo endpoints encapsulate longer-lived datasets on behalf of open-ended classes of clients [3]. When foo is a stateful SOAP service, in particular, its endpoints can be modelled as collections of one or more service instances, where all instances have the API of foo but are bound to different datasets [4]. A foo instance is typically created by an endpoint of a companion factory service, e.g. a foo-factory service, and its logical address is the address of the corresponding foo endpoint qualified with a reference to the dataset bound to the instance. foo-factory is coded or configured to publish the addresses of foo instances, along with any properties of their bound datasets whereby the instances may be discovered by clients.
  • replication: foo may be deployed at multiple network addresses for increased availability and scalability. 
The multiple endpoints may be deployed within a single administrative domain (site), and kept hidden behind a single public endpoint which invokes them with load-balancing and fault-tolerant strategies (single-site replication). Alternatively, they may be deployed at multiple sites and be independently published; Load-balancing is then performed at query-time by discovery services (multi-site replication) [5]. If foo is stateful, it may replicate its endpoints without replicating its instances, i.e. different foo endpoints collect different instances (endpoint replication). This separates workloads on different instances, but it does help with balancing the load across instances and it does increase their availability. Alternatively, foo may employ mechanisms to directly replicate its instances, hence their state (instance replication). Immutable state may be replicated when instances are created. Alternatively, foo may be designed to autonomically create replicas of instances at existing endpoints (e.g. using subscription services and notification services available within the system). Mutable state is synchronised over time across instances, with variable guarantees of consistency (e.g. partial, eventual). Alternatively, mutable state can be shared across instances via remote references into networked file-systems, databases, or access/storage services. The references can then be replicated as immutable state. In multi-site replication, clients discover replicas with queries for references to shared state.
  • scoped requests: foo endpoints may operate in multiple scopes, where each scope partitions the local resources that are visible to its clients, as well as the remote resources that are visible to the operations that the endpoint executes on behalf of its clients. In particular, the operations of the endpoint may result in the creation of state in a given scope, locally to the endpoint or else remotely to it, by interaction with other services that create state on behalf of the endpoint and/or its clients. Service scoping requires that requests to the endpoints are scoped, marked with the scope within which they are intended to occur. Unscoped requests or requests made outside one of the endpoint’s scopes are rejected by it.
  • secure requests: foo endpoints may perform a range of authentication and authorisation checks, including scope checks, in order to restrict access to its operations to distinguished clients. Service security requires that requests made to the endpoints be marked with adequate credentials. Unsecure requests or secure requests that fail authorisation checks are rejected by the endpoints.
    The security model is based on three pillars:
    • username/password security factor, provided by SOA3
    • SAML based impersonation (only for S2S communication inside the infrastructure)
    • Transport Layer Security (TLS), in particular HTTP on TLS (HTTPS) based on Public Key Infrastructure
Username/password and SAML work at message layer, while TLS works at transport layer and provides other advantages, such as encryption and server authentication. For this reason TLS can be combined with one of username/password or SAML in order to improve the overall security.
  • clients: a client of foo may be internal to the system (i.e. a system component in turn) or external to it. Clients often operate within a dedicated runtime, and in this case we refer to them as pure clients. In other cases, they share a common runtime and, like foo, they may be managed by some container. In particular, clients may be services in turn, and in this case we refer to them as a client services. Whether pure or services, clients make a number of calls to foo over their lifetime. Some of these calls may share the same scope and contribute to a broader task. In this case we say that the calls form a session and that the client is session-based, otherwise we say that the client is session-less[6]. Whether session-based or session-less, client calls may all occur in the same scope or else span multiple scopes. Single-scope clients are normally pure, whereas multi-scope clients tend to be client services.
  • faults: clients that interact with foo endpoints may observe a wide range of failures, including: failures that occur in the client runtime, before remote calls to foo endpoints are issued; failures that occur in the attempt to communicate with foo endpoints; failures that occur in the runtime of foo endpoints. We distinguish between the following types of failures:
    • errors are violations of the contract defined by foo which can imputed to faulty code or faulty configuration. Malformed inputs are examples of client-side errors, while bugs in the implementation of foo are examples of service-side errors;
    • contingencies are predictable violations of contract pre-conditions. There may be no bugs in either client or service code, but the foo endpoint is in a state that prevents it to carry out the client’s request. Data that cannot be found or cannot be created are examples of contingencies;
    • outages are I/O failures of the external environment and include network failures, database failures and disk failures.
If foo is designed for multi-site replication, contingencies and outcomes acquire implicitly additional semantics. We say that one such failure has retry-equivalent semantics, if it does not exclude successful interaction with other foo endpoints or foo instances. We say otherwise that the failure is fatal for the interaction, i.e. no other foo endpoint will be able to process the request successfully. Unavailable endpoints and other forms of connection timeouts are retry-equivalent outages, while lack of connectivity at the client-side is a fatal outage.

  1. teminology: services have been often described within the system as collections of one or more “port-types”, following the terminology endorsed by WSDL 1.x standards, and then abandoned in WSDL 2.x standards. For its wider adoption and technological independence, we prefer here to follow common terminology whereby a WSDL port-type defines a service in its own right.
  2. teminology: the term “running instance” has been used within the system to indicate the deployment and activation of a service at a given network address. We prefer here the term “service endpoint” to avoid confusion with "service instance", which is more commonly associated with stateful services
  3. teminology: statefulness is often understood only in relation to conversational state. Under this view, all system services are currently stateless. We prefer to to interpret the notion of statefulness so as to include back-end state, because all forms of state have related implications under service replication. Under this broader view, most system services are stateful.
  4. teminology: the system has traditionally used a different terminology for its stateful services. Service instances are called "WS-Resources", as WSRF is the set of standards with which they are uniformly exposed at the time of writing. We prefer here the term “service instance” for its wider usage. Note also that WS-Resources use WS-Lifetime, WS-ResourceProperties and WS-Notification protocols to expose, respectively, lifetime operations, the values of distinguished properties of their state, and subscriptions for/notifications of changes to the values of those properties. Some services capitalise on these standards and become stateful even when they expose a single instance. These stateful services are known as singleton services.
  5. Single-site replication is common in enterprise computing and normally requires static deployment and configuration. Multi-site replication reconciles with dynamic deployment and is more common within the system.
  6. The use of sessions client-side does not imply that foo endpoints maintains state for them, i.e. that foo is a conversational service. In fact, it does not even imply that calls within a sessions are processed by a single foo endpoint.

Goals and Principles

Within the previous assumptions, our model is motivated by a goal of consistency across different client APIs. In particular, the model will:

  • decrease the overall learning curve associated with using the system;
  • increase the quality of client libraries via sharing of best design practices;
  • decrease the documentation costs of of client libraries by reference to shared patterns and design components;

These goals add to the main goal of the client library framework, namely to:

  • decrease the development costs of client libraries, both first-time costs and maintenance costs;

To achieve our goals, we base the model on a set of design principles. In no particular order, these include:

  • generality: the model will endorse design solutions that do not limit its applicability to the range of services and clients outlined above;
  • coverage: the model will address a wide range of issues that transcend the semantics of individual services, including scoping issues, security issues, replica discovery and management issues, and fault management issues;
  • transparency: the model will endorse design solutions that simplify client usage, particularly with respect to requirements that are specific to our system;
  • testability: the model will not endorse design solutions that reduce or unduly complicate the possibility of unit testing for clients;

Service Proxies

The main pattern we consider for client libraries is service-centric. foo is represented in client code with a single abstraction and clients invoke its methods to interact with remote foo endpoints or foo instances [1] [2].

In common jargon, this abstraction is understood as a service proxy [3].

The proxy for foo is defined by an interface:

interface Foo {...}

The interface lists methods used to interact with service endpoints, and the methods are implemented in a class DefaultFoo to be used in production:

class DefaultFoo implements Foo {...}

The interface encourages clients to separate the use of proxies from their instantiation, without expecting clients to write adaptors for this purpose. A client component may use an injected proxies created elsewhere in client code. During testing, the component may be injected with a fake implementation of Foo which produces outputs and failures as required to drive the tests, e.g. a mock implementation or a stubbed implementation. Alternatively, the component may lookup proxies from a factory, and in this case it is the factory that may be configured during test setup so as to return fakes.

There are many well-known ways to design client components based on dependency injection and lookup (constructor injection, setter injection, manual injection, container-managed injection, concrete factories, abstract factories, ...). In all cases the availability of an interface enables clients to test their code independently from the network.

  1. An alternative approach is operation-centric, in that the service is represented indirectly by local models of the operations that comprise its API. We choose a service-centric approach for the familiarity of its programming model, and because it is simpler to implement and use against large service APIs.
  2. In the following, we avoid unnecessary distinctions between endpoints and instances, and use the term endpoint to refer to both.
  3. terminology: technically, we are dealing with a service façade rather than a proxy. This is because its API may differ substantially from the API of the service, as we discuss in detail later. We choose nonetheless the term proxy because it is more widely understood.

Proxy Lifetime

Proxies may be instantiated in either one of two modes:

  • in direct mode, the proxies are bound to endpoints explicitly addressed by clients. This mode serves clients that obtain addressing information from interactions with other APIs. It may also be used to point tools towards statically known endpoints, or else during integration testing, typically to interact with endpoints deployed on local hosts.
  • in discovery mode, the proxies are configured with a query for endpoints provided by clients. They are then responsible for submitting the query to the directory services of the system on behalf of clients, and for negotiating bindings to the resulting endpoint(s). This mode serves clients that have information which characterise the target endpoints and from which addressing information can derived.

The binding mode of proxies is part of their configuration. The design approach to configuring proxies is largely outside the scope of this model. The recommendation is to use concentrate all proxy configurations in dedicated objects, so as to simplify the constructions of proxies (hence their testing) and the evolution of their configuration requirements. These configuration objects may be created by clients, or ekes returned to clients by factories of the client libraries or even more sophisticated forms of object builders, including full-fledged embedded Domain Specific Languages (DSLs). For example, the client library framework provides fluent DSLs for proxy configuration. In what follows, we make no assumptions on the configuration mechanisms used by libraries and raise only requirements on the information that must be included in proxy configurations. We start the process now and require that:

  • proxy configurations must include the timeout to use for calls;

For what concerns the instantiation of proxies from their configuration, the following holds true:

  • instantiation is a local operation. Calls to the bound endpoints will be issued only when clients invoke the Foo methods implemented by the instances;
  • the lifetime of proxies terminates when it becomes eligible for garbage collection. In this respect, proxies behaves like standard Java objects and do not require any explicit termination signal from clients.
  • the client library makes no assumption on the lifetime of proxies and allows proxies with:
    • call lifetime: begins before a call to a foo endpoint and terminates immediately thereafter;
    • session lifetime: begins before the first session call to a foo endpoint, and terminates after the last session call to the same or another foo endpoint;
    • global lifetime: begins before the first call to a foo endpoint and terminates when the client does.
Thus clients may dedicate proxies to different calls, or else reuse the same proxy for an arbitrary number calls to foo endpoints, in multiple scopes and/or across multiple sessions.
  • the flexibility discussed in the previous point does not come at the expense of safe and efficient bindings. There are two exceptions, however. The first is that proxies created in direct mode may only be safely used for calls in one of the scopes of their bound endpoints. Client that operate in multiple scopes should avoid reusing such proxies to call foo endpoints in different scopes. The second exception concerns proxies created in discovery mode for a stateful foo, and we will discuss it in more details below;
  • since clients may reuse proxies, these retain only their configuration and treat it for the most part as private and immutable state. The client library gives this guarantee by making the configuration immutable, or by cloning the configuration with which proxies are instantiated. Proxies offer no methods to change their initial configuration;
  • since proxies are mostly immutable, clients may safely use any proxy from multiple threads;

Direct Mode

Proxies that work in direct mode must find in their configuration the address of a foo endpoint or, depending on the design of the service, a reference to a foo instance available at a given endpoint. The address will be modelled in accordance with the requirements of the technology stack used by the library. The proxies can then be used to create proxies that are bound for their entire lifetime to the addressed endpoint, i.e. cannot be used as proxies for other foo endpoints.

Endpoint Addresses

If foo is a REST service, a stateless WS service, or singleton WS service, the client library must allow clients to configure proxies from:

  • the name and port of a network host (host coordinates);
  • an address modelled as a;
  • an address modelled as a

The client library must:

  • complement any missing information with service-specific constants, so to obtains the full address of the endpoint (e.g context paths);
  • ensure the validity of completed addresses built from or, either by verifying the correctness of their context paths, or by rebuilding addresses from the host coordinates contained therein;
  • raise java.lang.IllegalArgumentExceptions when addresses are proved invalid;

Instance References

If foo is a stateful WS service, the client library must allow to configure proxies from:

  • the coordinates of a network host and the instance identifier;
  • a whose reference parameters identify a service instance at a given address;

As above, the library must:

  • ensure validation of references;
  • translate references into the addressing model of their implementation stacks (e.g. EndpointReferenceType in Axis’ generated stubs API).

Discovery Mode

Proxies that work in discovery mode must find in their configuration enough information to synthesise a query for foo endpoints or foo instances. The query will be modelled in accordance with the requirements of the technology stack used by the library.

If foo is stateless, the library will:

  • not require any information from clients, as a query can se synthesised from service constants;

If foo is stateful, however, the query depends on state-related properties of the target instances, and different clients may need different queries. The library must then:

  • model queries for service instance with dedicated objects;

Instance Queries

Query object must:

  • contain no explicit reference to the concrete query syntax which the library may use to discovery service endpoints. The library is for synthesising a concrete query from abstract properties of service instances which are specified by the client.

The following holds true about queries:

  • clients may create multiple query objects, or else reuse a single object. Like for proxies, the libraries makes no assumption on the lifetime of individual queries;

Like for proxy configurations, the modelling and creation of queries is largely outside the scope of the model. Beans, factories, and full-fledged DSLs are all viable options.

Endpoint Management

Proxies attempt to bind to the service endpoints that satisfy the query with which they have been instantiated. They do so combining:

  • a binding strategy: if foo uses multi-site replication for its endpoints or instances, queries may return more than one result. The discovery services will order results in order of increasing load and the proxies exploit the availability of multiple endpoints towards fault tolerance, i.e. to increase the chances of a successful call;
  • a caching strategy: issuing queries to discovery services adds costs to client interactions with foo, and it is the responsibility of proxy instances to reduce these costs whenever possible. In the lack of optimisations, the proxies would issue queries before each and every call. Under full optimisation, the proxies issue queries only when they really have to, possibly never.

The binding strategy of proxies is driven by the range of faults discussed above. In particular it is defined by the following rules:

  • BR1: submit the query with the directory services in the current scope and process the list discovered endpoints as follows:
    • BR2: if the current endpoint raises a retry-equivalent failure (contingency or outage) then attempt to bind the next endpoint if one exists, otherwise return the failure;
    • BR3: if the current binding raise an unrecoverable contingency, an error, or a general outage then return it;
  • BR4: log all the previous actions at INFO level.

Notice that proxies may not always able to determine fault semantics. The semantics of contingencies is usually known to proxies, but differences between errors and outages, and between retry-equivalent outages and general outages may not transpire through the underlying communication APIs. Ambiguous cases should be dealt proportionally to their number and nature. If the API can disambiguate most faults, then an optimistic approach is appropriate and the proxies should default to applying BR2. If instead the API does not provide enough information, then an optimistic approach is more indicated and the proxies should default to applying BR3.

  • the default number of retry attempts required in BR3 may be overridden by clients by invoking the setMaxRetries(int) method.

The caching strategy of proxies is defined by the following rules:

  • CR1: record the address of a bound endpoint, the so-called Last Good Endpoint (LGE), in a scope-indexed and query-indexed cache shared by all proxies;
  • CR2: if the LGE is defined, attempt to bind;
  • CR3: if the LGE raises a failure, the remove it from the cache and:
    • CR4: if the failure is an unrecoverable contingency or an outage then return the failure;
    • CR5: if the failure is a retry-equivalent failure (contingency or outage) then apply BR1 but excludes the LGE from the list of results;
  • CR5: logs of all previous actions at DEBUG level;

Since the LGE cache is shared across proxy instances, a new instance may find an LGE in it and apply CR2 before BR1, i.e. avoid query submission altogether. This optimisation is safe only the LGE is a plausible result of the query defined by the new instance, hence the requirement of a cache indexed by scopes and queries.

Notice that:

  • since queries are keys into caches, they may need to be value objects, i.e. implement hashcode() and equals() towards a notion of equivalence. This is not necessary when foo is stateless, provided that the constant query shared by all proxies is implemented as a singleton object. It may be necessary when foo is stateful, however, as clients may initialise different proxies with different instances of the same query. In this case, failing to implement queries as value objects may bypass the caching strategy, hence reduce the efficiency of proxies.
  • if foo is stateful, the combination of binding and querying strategies may have undesired effects for session-based clients. This occurs when foo instances are replicated across sites but the query used by the proxy returns instances that are not exact replicas. The calls issued by clients may then yield inconsistencies if, through caching, the binding strategy transparently rebinds instances in the middle of a session. Thus the problem may emerge only for session-based clients and under particular combinations of queries and calls. When the library may not exclude these combinations through design, then:
  • it must allow clients to disable fault-tolerance on given proxies through proxy configuration;

In this case, the proxies would treat LGE failures as unrecoverable, i.e. apply CR3 and CR4. This trades off a degree of fault-tolerance for safety. We return on this point later, in relation to instance-specific operations that are particularly prone to this problem.

It should also be noted that binding and caching strategies remain largely opaque to clients. Clients limits their involvement to:

  • if foo is stateful, providing queries for service endpoints;
  • observing and reacting to discovery faults, such the lack of suitable endpoints or the occurrence of faults in the interaction with the directory services;

We discuss failure handling in detail later on in the document.

Proxy API

After creating proxies, clients invoke their methods to call the foo endpoints that are bound to the proxies. Calls may take zero or more inputs, produce zero or one output, and raise one or more faults. Foo models inputs, outputs, and faults with the types that seem most convenient for its clients. The local types may differ substantially from those defined in the remote API of the service. Proxies are responsible for converting between local types and remote types. Even when the remote types seem adequate for Foo clients, adapting them to equivalent local forms helps Foo to insulate its clients from future changes to the remote API.

Local types are virtually unconstrained from a design perspective. For example, they may:

  • be constructed in a variety of patterns, including standard constructors, copy constructors, factories, builders, and more sophisticated forms of fluent APIs. When useful, they may deserialised from various representations, from language serialisation formats to, say, XML formats;
  • exhibit arbitrary behaviour, including validation behaviour at creation time or at any other point in their lifetime;
  • implement arbitrary interfaces and participate in arbitrary hierarchies;
  • use type parameters for type-safe reuse;
  • be arbitrarily annotated;
  • have non-trivial notions of equivalence, cloning behaviour, and useful String serialisations;

Similar freedom extends to the design of Foo. Foo may implement any interface, participate in any hierarchy, be arbitrarily annotated and parameterised. Furthermore, Foo may use method name overloading for calls that have related semantics but require a different number of inputs, or inputs of different types.

The API uses this freedom towards the goals of:

  • clarity and fluency, by choosing types that simplify client programming;
  • correctness, by choosing types that detect locally, and often even statically, constraint violations which would be only enforced remotely and dynamically by foo;
  • standardisation, by choosing types that are formal or de-facto standards for the semantics of the data, either in the context of the language (common Java interfaces, appropriate Exceptions, naming conventions, etc.) or in a broader context.

We discuss below how the methods of Foo are designed to model calls to foo endpoints. In particular, we look at choices of local types for inputs, outputs, and faults for prototypical calls, including calls that require or produce data collections, asynchronous calls, and calls that access the state of stateful foo instances.


The possibilities for the design of Foo are open ended. We illustrate some of options here using a fictional example. The example is intentionally convoluted to illustrate a wider range of options.

Assume foo exposes a operation bar which:

  • expects a rather complex and potentially recursive XML data structure Baz in input;
  • returns a simpler complex data structure Qux whenever Baz satisfies a set of constraints, from simple constraint (some attributes must not be null, other must be null) to complex constraints (some simple elements must have correlated values)
  • raises an InvalidBazFault when the input structure is null, is syntactically or structurally malformed, or does not satisfy the expected set of constraints;

Foo mediates calls to bar with the following method:

*  .....
* @throws IllegalArgumentException if ...
* @throws ServiceException if the call fails for any other error
Qux bar(Baz baz);


  • Baz is a class that uses the annotations of JSR 222 (JAXB 2.0) to bind its instances to XML, and the annotations of JSR 303 to declare validity constraints upon them which cannot be detected by the type-checker. The API offers a BazBuilder to fluently construct Baz instances across its plethora of mandatory and optional parameters, and Baz instances expose a set of sophisticated methods that allow clients to flexibly navigate its potentially very deep and recursive structure, including a query method based on XPath expressions. Baz instances override equals(), hashcode() and toString() to facilitate assertions in tests as well as debugging;
  • the documentation clarifies that proxies may throw:
    • an IllegalArgumentException if the input is null or invalid, enforcing JSR 303 annotations for the purpose. Proxies short circuit a remote call that would certainly fail and throws a local exception instead. They make sure that a null attribute violation is detected before the call (direct mode) or the query (discovery mode) are issued;
    • a generic ServiceException in correspondence with any other form of remote failure. We discuss below the semantics of this exception and more generically the rationale for Foo’s approach to failure reporting.
  • Qux is a fairly simple bean class, also decorated with JAXB annotations so that where the XML representation included a collection of uniquely named values, Qux exposes instead a Map of String keys. Furthermore, the Qux instances returned by bar() have been proxied, so that the invocations of some of its key methods can be intercepted, to some particular end. It also exposes methods that accept subscriptions and produce notifications in response to some key events of its lifetime.


Foo proxies may need to report to clients the range of failures introduced above. The proxies cannot, and indeed should not, predict that strategies that clients will adopt to handle such failures. However, they may assume that:

  • in production, clients will at least contain all forms of failure, i.e. fully log them and conveniently report them to users or clients further upstream. Silencing failures or thread terminations are typically undesirable outcomes. Failure containment is normally dealt within error handlers that act as ‘barriers’ or ‘points-of-last-defence’ high-up in the call stack.
  • clients may have coping strategies for contingencies that go beyond simple failure containment. The may be able to actually recover from the failures, e.g. by retrying with different inputs or by selecting an alternative execution path, including calling another service or falling back to defaults. Typically, clients will recover as close as possible to the observation of the failure, though not necessarily in the immediate caller.
  • clients are more likely to recover from contingencies than from outages. This is because contingencies are specific expectations set forth by the foo that clients should be prepared to handle somehow.

Based on these assumptions, Foo aligns with modern practices in:

  • using unchecked exceptions to report errors and outages. Clients that may only contain such failures in generic error handlers will be dispensed from the noisy, error-prone, brittle, and ultimately pointless task of explicitly catching and/or re-throwing exceptions along the call stack.
  • using checked exceptions to report contingencies. Clients may then avail themselves of the services of the typechecker to be alerted of failures that they should have prepared for.

Foo documents all the exceptions that its methods may throw, regardless of their type.

More specifically, Foo’s methods:

  • document all the errors that may be detected in the client runtime prior to calling a foo endpoint. In its bar() method above, for example, Foo documents an IllegalArgumentException in lieu of the InvalidBazFault that service would raise if proxies actually called its bar operation;
  • document and declare all the contingencies the foo may raise. If the service may raise an UnknownBazFault for its bar operation, for example, then Foo declares a corresponding checked exception for its method bar(), and proxies throw the exception upon receiving the fault from a foo endpoint. If foo declares a base class for a number of related contingencies, and if its operation bar may throw all the subclasses of the base class, then Foo declares only the base class for its method bar();
  • document a single ServiceException for any outage, or for any error that cannot be detected in the client runtime prior to calling a foo endpoint.

ServiceException marks the non-local semantics of Foo’s methods and serves as a base class or else as a wrapper for any other exception that proxies may observe. In particular, ServiceException is defined as follows:

package org.gcube.common.clients.api;
class ServiceException extends RuntimeException {
 private static final long serialVersionUID = 1L;
 public ServiceException() {}
 public ServiceException(String msg) {
 public ServiceException(Throwable cause) {
 public ServiceException(String msg,Throwable cause) {

Proxies wrap in ServiceExceptions any exception thrown by the underlying communication API. For example, if foo is a JAX-WS Web Service, proxies wrap in a ServiceException any WebServiceException thrown by their JAX-WS-compliant API of choice. If foo is a JAX-RPC Web Service, then proxies wrap in in a ServiceException any RemoteException or SOAPFaultException thrown by their JAX-RPC-compliant API of choice. In all cases, DefaultFoo documents what exceptions may cause the ServiceExceptions that its proxies may throw.

Clients that may only contain errors and outages may conveniently catch ServiceExceptions in their error handlers. Clients that wish to customise their containment strategies for particular outages, or that can even recover from them, may inspect the cause of ServiceExceptions and/or directly catch the general subclasses of ServiceException that we discuss next.

Common Faults

There are a number of ServiceExceptions which do not specifically relate to foo’s remote API but may arise in the interaction with any system service, including:

  • the inability to bind to a given foo endpoint. This may be the endpoint configured on a proxy created in direct mode. It may also be an endpoint that the discovery services return to a proxy created in discovery mode, as the discovery services are not immediately notified of endpoints that become unavailable after their publication;
  • the possibility of calls that are invalid for the target endpoints, including calls that are unscoped, or else issued in a scope which is not legal for a given foo endpoint, calls that contain illegal arguments, calls to operations that are not implemented, or only partially implemented by given endpoints (services that use plugins may raise these kind of exceptions);
  • the possibility of arbitrary failures in queries for foo endpoints, such as those issued by proxies created in discovery mode;

Proxies that observe the unavailability of an endpoint throw NoSuchEndpointExceptions, which are defined as follows:

package org.gcube.common.clients.api; 
class NoSuchEndpointException extends ServiceException {
    private static final long serialVersionUID = 1L;
    public NoSuchEndpointException() {
    public NoSuchEndpointException(String msg) {
    public NoSuchEndpointException(Throwable cause) {
   public NoSuchEndpointException(String msg,Throwable cause) {

Proxies that observe failures related to invalid requests throw InvalidRequestExceptions, or more specific subclasses thereof:

package org.gcube.common.clients.api;
class InvalidRequestException extends ServiceException {
    private static final long serialVersionUID = 1L;
    public InvalidRequestException() {}
    public InvalidRequestException(String msg) {
    public InvalidRequestException(Throwable cause) {
    public InvalidRequestException(String msg,Throwable cause) {
package org.gcube.common.clients.api;
class IllegalScopeException extends InvalidRequestException {
    private static final long serialVersionUID = 1L;
    public IllegalScopeException() {}
    public IllegalScopeException(String msg) {
    public IllegalScopeException(Throwable cause) {
    public IllegalScopeException(String msg,Throwable cause) {
package org.gcube.common.clients.api;
class UnsupportedOperationException extends InvalidRequestException {
    private static final long serialVersionUID = 1L;
    public UnsupportedOperationException() {}
    public UnsupportedOperationException(String msg) {
    public UnsupportedOperationException(Throwable cause) {
    public UnsupportedOperationException(String msg,Throwable cause) {
package org.gcube.common.clients.api;
class UnsupportedRequestException extends InvalidRequestException {
    private static final long serialVersionUID = 1L;
    public UnsupportedRequestException() {}
    public UnsupportedRequestException(String msg) {
    public UnsupportedRequestException(Throwable cause) {
    public UnsupportedRequestException(String msg,Throwable cause) {

Proxies that cannot interact with the discovery services throw DiscoveryExceptions, which are defined as follows:

package org.gcube.common.clients.api;
class DiscoveryException extends ServiceException {
   private static final long serialVersionUID = 1L;
   public DiscoveryException(String msg) {
   public DiscoveryException(Exception cause) {

Bulk Inputs and Outputs

Proxies may need to call foo operations that that take or return collections of values. Foo may then rely in its API on custom interfaces or classes that encapsulate the collection values required or provided by foo, e.g.:

Nodes nodes(Paths paths) throws ... ;

where Nodes and Paths are ad-hoc models of nodes and path to nodes of some tree-like data structure.

More commonly, however, Foo defines methods that rely on the standard Java Collections API. When methods return collections of values, Foo choose Lists:

List<Node> nodes(...) throws ... ;

In returning Lists, Foo is not necessarily conveying to clients that the order of Nodes is meaningful, or that the same Node may occur twice within the List. Rather, Foo is following two principles: a) the type that best models a collection of values may only be defined by its consumers, on the basis of their own processing requirements; b) some types are more versatile than others in adapting to a wider range of processing requirements. In its ignorance of how clients will consume the collection, Foo returns it as a List for the versatility of the List API, and in the assumption that when its clients are better served by other, more constrained Collection types they can easily and cheaply derive them from Lists.

For methods that take collections however, Foo acts as a consumer and chooses the Collection type that most closely captures the required constraints at compile-time, e.g. a Set if Foo expects no duplicates:

List<Node> nodes(Set<Path> paths) throws ... ;

On the other hand, Foo does not restrict the semantics of inputs more than it should. For example, if there are no particular requirements on input collections, Iterator or Iterable are the most flexible choices, as they make the API immediately usable with a broader set of abstractions than Collections:

List<Node> nodes(Iterable<Path> paths) throws ... ;

The choice between Iterable and Iterator is not clearcut. Iterable can improve the fluency of both client and implementation code, but requires materialised collections. This may be desirable in itself as an indication that the collections will be materialised in memory and that very large streams coming from secondary storage or network are not expected. When streams are not large, however, Iterable forces clients to accumulate their elements before they can use the API.

Asychronous Methods

Calls to foo may be synchronous or asynchronous:

  • synchronous calls block clients until they have been fully processed by foo endpoints and their output, or just an acknowledgement of completion, is returned to clients. This temporal coupling between clients and endpoints forces both to relinquish some control over their computational resources. Clients must suspend execution in the calling thread and endpoints cannot schedule their availability to answer. It also requires calls to be fully processed within communication timeouts. Synchronous calls are thus preferred when endpoints can process them quickly, i.e. when the time in which clients and endpoints synchronise is short. This is the case when calls generate short-lived process and require the exchange of limited amounts of data;
  • asynchronous calls do not block clients, either because they return no output (i.e. the operations are one-way) or because their output can be produced and returned to clients at a later time. This leaves clients and endpoints in control of their computational resources, but it complicates the programming model at both sides. Asynchronous calls are preferred when endpoints can fully answer only after long-lived processes, including those required to exchange large datasets;

foo may pursue the benefits of asynchrony by designing and implementing its operations for it. One-way operations return immediately with an acknowledgement of reception. Operations that produce output may return the endpoint of another service that clients can poll to obtain the output, when this becomes available. Alternatively, foo may require that clients indicate an endpoint that foo endpoints can call back to deliver the output. In all cases, foo execute the operations in background threads.

Foo may pursue some of the benefits of asynchrony even if foo does not. In other words, Foo may offer asynchronous calls over synchronous remote operations. In practice, this amounts to calling endpoints in background threads. Polling and callbacks remain available as patterns for the delivery of output between threads, though their implementation is now local to clients. The approach does not cater for communication timeouts, hence for calls that generate long-lived processes at foo endpoints. However, it allows clients to make further progress while the endpoints are busy processing their calls.

Polling And Callbacks

An asynchronous call that induces a long-lived process at the service endpoint may return immediately with a reference to the ongoing process. Clients may then use the reference to wait for the process to complete only when they need its outcome to make further progress. They may also poll the status of process and perform other work while it is still ongoing. In some cases, they may even be able to abort the process.

In Java, the standard model for such references is provided by Futures. For example, Foo defines the following method:

* ...
* @throws ...
* @throws RejectedExecutionException if the call cannot be submitted
Future<String> barAsync(...) throws ...;

which promises to return a String when this becomes available. Clients use Future.get() methods to block for the output, indefinitely or for a given amount of time. They can use Future.isDone() to poll the availability of any output. They can also use Future.cancel() to revoke the submission of a call (in case this has been scheduled but not issued yet) or, if the service allows it, to cancel the remote process.

Notice that barASync() declares failures following the strategy discussed previously, with the understanding that these are local failures raised by proxies before calls are actually issued, i.e. typically unchecked exceptions for argument validation errors or call submission errors. In particular, the library will report call submission errors as java.util.concurrent.RejectedExecutionExceptions.

Failures raised by foo in the context of processing calls will instead be delivered in Future.get() invocations, in accordance with the Future API. In particular, unchecked ServiceExceptions and checked contingencies will be found as the cause of java.util.concurrent.ExecutionExceptions thrown by Future.get() invocations, and timeout exceptions will be delivered as java.util.concurrent.TimeoutExceptions.

If the underlying remote operation is one-way, Foo defines barAsync() as follows:

Future<?> barAsync(...);

which returns a wildcard Future that clients may use to cancel submissions/processes, as above, or that they ignore altogether in case fooAsync is conceptually fire-and-forget.

In addition to polling, Foo may also rely on callbacks to deliver call outputs to its clients. In this case, Foo requires clients to provide a Callback instance at call time, i.e. an instance of the following interface:

 package org.gcube.common.clients.api;
 interface Callback<T> {
  void done(T result);
  void onFailure(Throwable failure); 
  long timeout();

Specifically, Foo may overload barAsync as follows:

Future<?> barAsync(..., Callback<String> callback) ;

The method promises to:

  • return immediately with a wildcard Future, which clients can use as above for cancellation purposes;
  • deliver the outcome to the Callback instance as soon as this becomes available and no later than clients indicate with the timeout() callback.

The delivery occurs through two different callbacks, depending on whether the outcome is a success (done()) or a failure (onFailure()). Timeout errors are delivered as java.util.concurrent.TimeoutExceptions, regardless of the underlying implementation.

Clients may entirely consume the output in the Callback instance. Alternatively, they are responsible for exposing it directly or indirectly to other components.

note: Libraries that implement asynchrony over synchronous calls must configure their proxies with an infinite (or very high) call timeout. This is because clients specify independent waiting timeouts for individual asynchronous calls (either through Callbacks or in Future.get()), and these may ill-interact with call timeouts set globally on the proxy. These libraries must then define synchronous and asynchronous methods in separate interfaces, e.g. Foo with synchronous methods and FooAsync with asynchronous methods. Foo proxies work under standard timeout defaults, which clients may override at instantiation time. FooAsync proxies work instead under infinite timeouts, which clients will override at call time.


With polling and callbacks, Foo let its clients perform useful work as they wait for the output of long-lived processes that execute at foo endpoints. The approach however does not directly address the case in which the output itself is a large dataset. In this case, clients must still block waiting for the whole dataset to be transferred before they can start processing it. They also need to allocate enough local resources to contain the dataset in its entirety. Similar demands are faced by foo, which needs to produce and hold the entire dataset before it can pass it to its clients. Thus large datasets may reduce the responsiveness of clients and the capacity of service endpoints.

foo and its clients may avoid these issues if they produce and consume data as streams. A stream is a lazily-evaluated sequence of data elements. Clients consume the elements as these become available, and discard them as soon as they are no longer required. Similarly, endpoints produce the elements as clients consume them, i.e. on demand.

Streaming is used heavily throughout the system as the preferred method of asynchronous data transfers between clients and services. The gRS2 library provides the required API and the underlying implementation mechanisms, including paged transfers and memory buffers which avoid the cumulative latencies of many fine-grained interactions. The API allows services to “publish” streams, make them available at a network endpoint through a given protocol. Clients obtain references to such endpoints, i.e. stream locators, and clients resolve locators to iterate over the elements of the streams. Services produce elements as clients require them, i.e. on demand.

Data streaming is used in a number of use cases, including:

  • foo streams the elements of a persistent dataset;
  • foo streams the results of a query over a persistent dataset;
  • foo derives a stream from a stream provided by the client;

The last is a case of circular streaming. The client consumes a stream which is produced by the service by iterating over another stream, which is produced by the client. Examples of circular streaming include:

  • bulk lookups, e.g. foo streams the elements of a dataset which have the identifiers streamed by the client;
  • bulk updates, e.g. foo adds a stream of elements to a dataset and streams the outcomes back to the client;

More complex uses cases involve multiple streams, producers, and consumers.

The advantages of data streaming are offset by an increased complexity in the programming model. Consuming a stream can be relatively simple, but:

  • the assumption of remote data puts more emphasis on correct failure handling at foo and its clients;
  • since streams may be partially consumed, resources allocated by foo for streaming need to be explicitly released;
  • consumers that act also as producers need to remain within the stream paradigm, i.e. avoid the accumulation of data in main memory as they transform elements of input streams into elements of outputs streams;
  • implementing streams is typically more challenging that consuming streams. Filtering out some elements or absorbing some failures requires look-ahead implementations. Look-ahead implementations are notoriously error prone;
  • stream implementations are typically hard to reuse (particularly look-ahead implementations);

Thus streaming raises significant opportunities as well as non-trivial programming challenges. The gRS2 API provides sophisticated primitives for data transfer, but it remains fairly low-level when it comes to producing and consuming streams.

The streams library provides the abstractions required to simplify further stream-based programming in simple and complex scenarios. It implements a DSL for stream manipulation which is built around the Stream interface, an extension of the familiar Iterator interface. The DSL simplifies a range of stream transformations, making it easy to change, filter, group, and expand the elements of input streams into elements of output streams. The DSL also allows to configure failure handling policies and event notifications for stream consumption, and it simplifies the publication of streams as gCube ResultSets.

Foo relies on the DSL of the Streams API whenever its methods need to take and/or return streams. For example, if foo can stream the results of a given query, for example, Foo may provide its clients with the following method:

Stream<Item> query(Query query) throws ... ;

where Item and Query model, respectively, the elements of a remote dataset and a query issued against that dataset, and where the output Stream gives access to a remote gCube Resultset produced by foo. Clients are free to access the locator of the stream with Stream.locator() and consume it with the lower-level gRS2 API, if required.

Similarly, if foo can stream the elements with given identifiers, Foo may define the following method:

Stream<Item> lookup(Stream<Key> ids) throws ... ;

where Key models Item identifiers. By taking a Stream as input, Foo promises to publish the stream on behalf of clients and to send the corresponding locator to foo.

Since clients may want to remain in charge of publication, Foo overloads lookup() as follows:

Stream<Item> lookup(URI idRs) throws ... ;

i.e. accepts directly the locator to a gCube Resultset of keys which has already been published by the client, or by some other party further upstream.

Both query() and lookup() model failures according to the strategy outlined above, with the understanding that these are failures that may occur only before foo starts producing streams (including failures thrown by DefaultFoo before calls are actually issued). Failures raised by foo in the context of producing streams instead be delivered during Stream<code> iteration, in accordance with the specification of the <code>Stream API. In particular, unchecked ServiceExceptions and checked contingencies will be found as the cause of StreamExceptions.

Finally note that Foo may return streams through polling and callbacks if foo can start producing them only at the end of long-lived processes, e.g.:

Future<Stream> pollStream(...) throws ... ;


Future<?> callbackStream(...,Callback<Stream> callback) throws ... ;

Service Instances

As discussed above, stateful services share a number of design elements:

  • there is a companion factory service, typically stateless, which creates service instances;
  • service instances have a lifetime which may be independent from their endpoint’s;
  • service instances have properties whereby they can be discovered;

If foo is stateful its client library includes a proxy interface to its companion foo-factory service, e.g. FooFactory. FooFactory and its implementation are designed with the same patterns and principles discussed so far for Foo, including direct and discovery modes for its proxies. The relationship between FooFactory and Foo<?code> is explicitly captured by one or more factory methods in <code>FooFactory, e.g.:

public Foo create(...) throws ...;

create() triggers the creation of a foo instance and returns a Foo proxy bound to that instance. Its design is otherwise governed by the principles already discussed for other proxy methods. In particular, create() may be overloaded, may be asynchronous, and may return a collection if it results in the creation of multiple foo instances. If and when appropriate, it may also be named to reflect more adequately the semantics of the operation (e.g. newSource() or startJob()).

As it is explicitly created, a foo instance may be explicitly destroyed. This can be accomplished through a method destroy() in the API of Foo:

void destroy() throws ServiceException;

The method takes no input and has no outcome other than a potential failure. Its semantics is ultimately service-specific, though its side-effects typically include the release of computational resources at the target foo instance. In all cases, it is likely that foo will place tight security requirements on its invocations.

Besides a lifetime, foo instances have properties and, while the main role of such properties is to characterise instances for discovery purposes, there may be a requirement for exposing them to clients directly through proxies. In this case, one obvious option is to extend Foo with accessor methods and, where applicable, mutator methods with appropriate bindings for property values. Often, a better option is to factor out accessors and mutators in a separate FooProperties class and extend Foo with the following methods:

FooProperties properties() throws ServiceException;
void setProperties(FooProperties properties) throws ServiceException;

This capitalises on the standards adopted within the system to retrieve and update in bulk instance properties. Clients would obtain the all the instance properties with a single remote call, inspect them or change them locally as required, and then commit any change with another single remote call. FooProperties will not define mutators for read-only properties, and setProperties() can be excluded altogether if all the properties are read-only.

Notice that lifetime-related and property-related methods operates directly on the state of foo instances. As such, they may generate undesired side-effects when clients invoke them on proxies created in discovery mode in the middle of a session. We have discussed the issue above in general terms, and indicated a minimal solution in the terms of ‘sticky session’ configuration on the proxies. Other, more structured and explicit solutions may be preferred if a large class of use cases assumes session-based clients and access to instance properties. For example, destroy(), as well as property accessor and mutators methods, may be collected in a dedicated FooInstance interface and Foo may be extended with a method that returns a (private) implementation of the interface:

FooInstance toInstance() throws ServiceException;

Having to invoke toInstance() on a proxy clarifies to clients that the operations they may invoke on the returned value are conceptually different from the other operations declared by Foo, in that they explicitly operate on the state of the foo instance currently bound to the proxy. Thus FooInstance makes clients more aware of the binding and caching strategies of such clients, i.e. that the proxy may be bound to another instance during the session. This reduces the possibility that clients may overlook the possibility of inconsistencies, and improves the readability of their code. Of course, FooInstance makes also their code more verbose, and introduces noise in the use of proxies in direct mode.

Context Management

Proxies invoke the remote operations of foo in a context which encompasses more information that the target service endpoint and the input parameters of the calls. In particular, calls occur always in a given scope and conditionally to the provision of credentials about the caller. An attempt to call foo in no particular scope, or in a scope in which the target endpoint does not exist, as well as calls that are issued anonymously will be rejected, either by proxies or by their bound endpoints.

We discuss below how this contextual information is made available to proxies.

Scope Management

One way of providing DefaultFoo instances with scope information is to require their immediate callers to specify one when the instances are created. Making scope explicit, however, induces clients to propagate scope information across their call stack, and this may easily prove intrusive for their design.

A less intrusive approach is to bind scope information to the threads in which DefaultFoo instances issue remote calls. Clients remain responsible for making the binding, but they can do so further up the call stack, as early as scope information becomes available to them. Client components that execute on the stack thereafter need have no design dependencies on scope.

To implement this scheme, DefaultFoo relies on the common-scope library, which provides the tools required to bind and propagate scope as thread-local information. In particular, common-scope models scope as plain Strings and includes a ScopeProvider interface with methods to bind a scope with the current thread (ScopeProvider.set(String)), obtain the scope bound to the current thread (ScopeProvider.get()), and remove the scope bound to the current thread (ScopeProvider.remove()). ScopeProvider gives also access to a single instance of its default implementation, which can be shared between clients and DefaultFoo (the constant ScopeProvider.instance).

Thus a client component high up the call stack binds a scope to the current thread as follows:

String scope = ... 

and, lower down the call stack, DefaultFoo obtains the same scope as follows:

String scope = ScopeProvider.instance.get();

Note that:

  • since the shared ScopeProvider is based on an InheritableThreadLocal, DefaultFoo may execute in any child thread of the bound thread;
  • if the current thread and its ancestors are unbound, the shared ScopeProvider attempts to resolve scope from the system property gcube.scope. When clients operate in a single scope, this property can be set when the JVM is launched and clients can avoid compile-time dependencies on ScopeProvider altogether;
  • clients that reuse threads to call foo in different scopes will need to explicitly unbind threads, and typically will do so in the same component that binds them;

Security Management

The security characteristics of foo are defined on the basis of the features of the service and the requirements of the data. These characteristics are set at service level and the clients should only conform their calls to avoid to be rejected. In order to do that, a set of API has been included in a library called common-security providing the functionalities to easily manage the security aspects. The Proxy Model (DefaultFoo) avoids to write security specific code at client level making easier the adaptation of legacy client to gCube/SOA3 security model: in other words, the Client is not obliged to refer to common-security, but all the related functions can be performed by DefaultFoo.

The main, abstract concept behind the model is the concept of Credentials. This general term, in our context, can be referred to:

  • Username/password
  • SAML assertion
  • X509 certificate

In terms of pseudo-code, this means that the interface Credentials is implemented by

  • UsernamePasswordCredentials
  • SAMLAssertionCredentials
  • X509Credentials

Credentials management is similar to scope management, with the difference that the scope is a single string and the credentials potentially are more pieces of information with different meanings depending on the actual implementation of the Credentials interface. The client side security api are responsible to properly use these pieces of information transparently to the user. In particular:

  • UsernamePasswordCredentials contains getter and setter methods of username and password (String getUsername() - setUsername(String), char[] getPassword() - setPassword(char[]))
  • SAMLAssertionCredentials contains the setter of a SAMLAssertion, in object form or as a String defining the path to a file containing the signed assertion, and the getter of the object (setSamlAssertion (String path), setSamlAssertion (SAMLAssertion assertion), SAMLAssertion getSamlAssertion ()
  • X509Credentials contains the getters and the setters of the information needed to use a java keystore and a truststore[1] (keystore and truststore, passwords and types)

A singleton CredentialsManager is defined to manage the Credentials with a Thread Local logic (InheritableThreadLocal). A Credentials object can be added to the manager having effects to the current Thread and descendant threads, unless another call of the adder overrides the information:


CredentialManager also exposes the methods to get the credentials associated to the current Thread

Credentials credentialsList = CredentialsManager.instance.get()

or to remove credentials:


The thread local logic offers the possibility to search for the credentials set in the current thread and in the ancestors. If nothing has been found, as last resort, the Credentials are get from JVM properties, in particular:

  • UsernamePasswordCredentials are defined by gcube.username and gcube.password parameters
  • SAMLAssertionCredentials is defined by gcube.samlassertion pointing to a file containing the signed SAML assertion to be used
  • X509Credentials is defined by gcube.KeyStore, gcube.KeyStorePassword, gcube.KeyStoreType, gcube.TrustStore, gcube.TrustStorePassword, gcube.TrustStoreType, with meanings equal to the corresponding JSSE properties (see the references)

It is worth to notice that nothing prevents the use of more than a Credentials object: the evaluation of the security information of the call is responsibility of foo, so who manages the client should somehow know the security level of the service called. All the possible combinations of Credentials are accepted at client level, anyway it should be noticed that:

  • an HTTPS connection requires, at least, the definition of a truststore containing the Server CA certificate at Client level
  • the definition of a Client keystore means that the certificate used will be sent to the Server for the identification of the client
  • username/password and SAML impersonation are alternative authentication factors
  • in general foo will check first username/password, then SAML assertion and finally the DN of the certificate, this means that, in case of multiple authentication factor set, the priority is defined
  • in general SAML assertions are produced by SOA3 Identity Provider after a successful authentication to be used in case of propagation of the call among diverse services: this means that external client may not use SAML assertion for their call, they may not have visibility of SAML assertions at all

The Thread Local model provides the possibility to use different credentials for different calls.


Appendix A: Specifications

@TODO: briefly summarises model in terms of “may”, “should”, “must” specifications.

Appendix B: API

@TODO: list interfaces and classes defined by the model.