Execution Engine Specification

From Gcube Wiki
Jump to: navigation, search

Overview

The Execution Engine aims to execute arbitrarily complex Execution Plans. Execution Plans are plans for the invocation of code components (aka invocables, i.e. services, binary executables, scripts, …) that ensures that prerequisite data are prepared and delivered to their consumers by defining the flow of data and/or control. The initial Execution Plans provided for execution by the Execution Engine originate from a WorkflowEngine instance. In addition, since the Execution Engine supports distributed execution, it can forward subplans of its initial Execution Plan to other Execution Engine Instances. In this way, one can execute any kind of workflow on top of a distributed computational infrastructure.

When the Workflow Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service acts as the front end not only to Execution facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Execution Engine instance Execution Environment Providers pushes information that need to be made available to other engine instances. Configuration properties that are used throughout the Workflow Engine instance are retrieved by the appropriate technology specific constructs and are used to initiate services and providers once the service starts.

Key features

Control and monitoring of a processing flow execution.
The Execution Engine provides progress reporting and control constructs based on an event mechanism for tasks operating both on the D4Science and external infrastructures.
Handling of data streaming among computational elements.
PE2ng exploits the high throughput point to point on demand communication facilities offered by gRS2
Expressive and powerful execution plan language
The execution plan elements comprising the language can execute literally anything. In addition, the Execution Engine is technology unaware regarding the components it can invoke, handling in the same uniform manner executables such as SOAP Web Services & WSRF, HTTP API (RESTful WS), various executables (including shell scripts), Java Objects etc.
Multiple ways of invoking executables
In-process: ultra-high performance, no security boundary crossing, low need for data exchanges
Intra-process: high throughput and performance, local security boundaries crossed
Intra-node: low throughput (depending on network), organisational security boundaries crossed
Advanced error handling support through contigency reaction
Each Execution Plan element which invokes executables can be annotated with contingency reaction triggers.
Unbound extensibility via providers for integration with different environments.
The system is designed in an extensible manner, allowing the transparent integration with a variety of providers for storage, resource registries, reporting systems, etc.

Design

Philosophy

The Execution Engine is designed to support an expressive, feature-rich workflow language. It aims to enable the execution of arbitrarily complex workflows of literally all kinds by offering a wide array of constructs, namely Execution Plan Elements, which can be used to invoke any kind of executable or to group collections of elements in execution flow structures. The uniform handling of such constructs by the Execution Engine allows the construction of such arbitrarily complex workflows.

As a constituent part of PE2ng, the Execution Engine is designed with a layered architecture decoupling the business domain, the infrastructure specific logic and the core execution functionality therefore allowing core re-usage to a multitude of use cases and avoiding sub-optimal compromises of strictly agnostic solutions.

Architecture

The Execution Engine comprises a single component, whose internal architecture corresponds to the constructs it provides. This grouping can be summarized as follows:

  • Execution Elements
  • Data Types
  • Events
  • Contingencies

Deployment

The Execution Engine, in its service wrapped version, should be deployed at:

  • Each node which should participate in the execution of Execution Plans of local or remote origin.
  • Each node which is aimed to act as a gateway to external infrastructures.

Large deployment

In case of high demands for computational power, the Execution Engine should be deployed on as many nodes as possible, so that the Workflow Engine instances which contact it are able to contact a large number of nodes and distribute the computational load evenly across the infrastructure.

Execution Engine large deployment

Small deployment

If the processing requirements in the infrastructure are low and/or there is no need to contact external infrastructures, the Execution Engine can be deployed only at the node which hosts also the Workflow Engine and acts as an entry point for incoming workflow processing requests. This means that execution will take place only at that node, locally. In this minimal deployment scenario, one need just deploy the Execution Engine as a library.

Execution Engine small deployment

Use Cases

Well suited Use Cases

The Execution Engine has been successfully used at the execution of all workflows involved in the use cases of the Workflow Engine, as the enabling element of the latter.

Less well suited Use Cases

As the Execution Engine aims to provide a generic facility for executing workflows, it cannot know the semantics of its input and output data. Applications which need such kind of data comprehension should instead opt for implementing special adaptors for the Workflow Engine.