The Execution Engine is constructed to operate on a variety of environments. What is required for this to happen, is to be able to identify hooks, with which to attach itself to the available Information Space, as well as access to permanent storage. These hooks are provided through the the Execution Environment Providers.
gCube Web Service Interface
When the Execution Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service interface acts as the front end not only to Execution facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Execution Engine instance Execution Environment Providers push information that needs to be made available to other engine instances. Additionally, the service "etc" folder contains the logging properties that control the logging granularity of the service, as well as the rest of the underlying components. Configuration properties, that are used throughout the Execution Engine instance, are retrieved from the service jndi, and are used to initiate services and providers, once the service receives the init event.
At service installation time, one should make note of the deploy-jndi-config.xml file of the service. As the service is the front end of the underlying engine, and also acts as a placeholder of published information concerning the host that the engine runs on, different information will need to be published, depending on the platform this instance gives access to. Currently, 4 platforms are identified and need to be duly noted in the service jndi. These four categories are:
- GCube - The node hosting the instance is a gCube platform node
- Grid - The node hosting the instance is an EGEE gLite UI node
- Condor - The node hosting the instance is an Condor gateway
- Hadoop - The node hosting the instance is a Hadoop gateway
The respective entries in the service jndi are already available, but the administrator needs to modify these entries and comment out the non-relevant entries to enable a single one. For example, the respective portion of the service jndi for a gCube node should look as follows:
... <environment name="nodeType" value="GCube" type="java.lang.String" override="false" /> <!-- <environment name="nodeType" value="Grid" type="java.lang.String" override="false" /> --> <!-- <environment name="nodeType" value="Condor" type="java.lang.String" override="false" /> --> <!-- <environment name="nodeType" value="Hadoop" type="java.lang.String" override="false" /> --> ...
Overcoming possible obstacles with EGEE gLite UI node instances
When the nodeType is set to Grid, the following problem needs to be resolved: the environmental variable $GLOBUS_LOCATION, which is needed by the gCore container, might also be needed by the gLite installation. For this reason, the node administrator needs to perform the following steps, in order for a client through the execution engine to be able to submit jobs through the gLite UI command line interface. Since the administrator will need to override the initial, gLite used, value of the $GLOBUS_LOCATION variable to point to the gCore container, the original value will need to be made available to the Execution Engine instance to use it when contacting the gLite CLI. This is done through the labeling functionality of the gCore container. The administrator must update the file $GLOBUS_LOCATION/config/GHNLabels.xml adding a new entry as follows:
... <Variable> <Key>ORIGINAL_GLOBUS_LOCATION</Key> <Value>(Original GLOBUS_LOCATION value e.g.)/opt/globus</Value> </Variable> ...
This value must be added to the GHNLabels.xml before the first time the container is started. Otherwise one must clear the gCore container state, and then restart the container.
For a client of the ExecutionEngine to be able to formally describe the plan that it wants to execute, the constructs offered by the Execution Plan are used. These constructs have the following main pillars.
- Every plan is potentially one execution unit for the execution engine. Each plan, while executing, can request different parametrization.
- The variables defined in a plan are the common data placeholders through which the elements of the plan exchange data.
- Data Types
- Each variable has a type which describes the different characteristics of the data exchanged.
- Each variable is accessed through defined parameters. Parameters are distinguished by their direction and processing.
- Execution Tree
- The plan hierarchy is composed of a number of elements that control the flow and the actions of the plan
Every plan, created and executed, follows a life cycle which, in every point, is updated and reported to the client, through the use of events. These events follow the Observer / Observable pattern, and the defined events that are emitted during the execution life cycle are:
- Execution Started
- Event emitted when the execution is initiated.
- Execution Completed
- Event emitted when the execution is completed either successfully or not.
- Execution Paused
- Event emitted when the execution is paused by the client.
- Execution Resumed
- Event emitted when the execution is resumed after being paused by the client.
- Execution Canceled
- Event emitted when the execution is canceled by the client.
- Progress Report
- Event emitted from internal Plan Elements reporting on the progress of their execution.
- External Progress Report
- Performance Report
- Event emitted from internal Plan Elements reporting timing and performance statistics on their operation
The Progress Report and Performance Report events can be requested to be omitted, if the client requests it through the Plan Configuration
The Execution Engine, from the time it receives an Execution Plan, and starts its execution, creates a context within which the whole execution takes place. This context enables the monitoring of the execution tree, event propagation, and management. Since the execution may have to be moved to multiple execution containers through Boundary Elements, this context remains synchronized across multiple hosts through control Channels. Every partial execution instance, initialized in every execution container, acts as the original context for the specific container, and through the control Channel synchronizes its state with the ones it is paired with.
This scheme works well for the Plan Elements that operate in the context of the engine. But since the context is an internal to the engine structure, it cannot be used by external components such as Java Objects and Web Services invoked through Java Object and Web Service elements respectively. To cover this gap, and to allow for external components to offer a more integrated with the engine service, Java Objects and Web Services may also ,with the trade off of being coupled with the execution engine at compile time, receive an execution context construct. Through this, they can emit progress events, be notified for execution life cycle, as well as receive parametrized values that they may need during their operation, and can be better initialized externally.
For Java objects, which are initialized in the same address space as the one the engine that invokes it, the context kept by the engine is wrapped and passed to it. For Web Services, a new control Channel is initialized in the caller side, an identifier to it is created and passed with the SOAP envelope header with the call made. In the Web Service side, the execution engine provides utilities to retrieve the SOAP envelope piggy bagged information, instantiate en execution context construct, which, from then on, takes care of synchronizing itself with the caller side.
Given the distributed nature of the environment the execution engine operates on, as well as the level of expressiveness the execution plan was designed to provide, another construct offered by the engine is the ability to define reactions in cases of specific errors. Different Plan Elements support different levels of contingency reactions, while others do not support them at all. To define a contingency reaction one must first define the trigger that will enable the reaction. This trigger is an exception that may be raised by the invoked component. Once the exception is caught, if a reaction is defined that can be triggered by the caught event, it takes over and handles the error. A number of reactions can be defined to be set by different triggers. The currently available reactions to such a trigger are listed below:
- No reaction
<contingency error="error name" isFullName="true/false"> <reaction type="None"/> </contingency> ... <contingency error="error name" isFullName="true/false"> <reaction type="Retry" retries="..." interval="..." /> </contingency> ... <contingency error="error name" isFullName="true/false"> <reaction type="Pick" exchaust="true/false"> <pick>...</pick> <pick>...</pick> ... <retrieve>...</retrieve> </reaction> </contingency>
Every, external to the Execution Engine, callable object that can take part in an Execution Plan, and invoked during execution, needs to be described so that the Execution Engine's clients can find all the information needed to construct the respective Plan Elements that will describe these callables, namely Shell scripts, Java objects and Web Services. This information is detailed in an XML document that each of these callables expose.
More information on the schema and examples of these profiles, can be found in the respective section for the Invocable Profiles.
PE2ng avoids execution overwhelm through a queuing mechanism. During job submission for execution, a throttling front-end takes over PE2ng’s execution gateway. Plans that accept to be queued declare additionally their utilization requirements along with a “nice” integer corresponding to the number of times a plan can be postponed.
As for an example, when a new plan is submitted, first it is examined for its concern about the queue. Only then it is analyzed for the minimum resource needs it requires per node in order to evaluate every node for fulfilling those requirements and then a job is actually submitted, otherwise it is queued up. For each job that gets completed, queuing mechanism gets updated with actual running jobs through monitoring system, and evaluates each of the waiting jobs for their requirements, starting from the top of the queue and skipping over-demanding jobs. Every waiting job can be skipped for a limited number of times to avoid job stall.
In advance to job throttling, queue mechanism is transparent for response sensitive jobs that are not concerned about queue, while jobs with high demands receive a maximum number of postpones before eventually executing.
In order to use queuing mechanism, execution engine with queue support must be included. The library is available in our Maven repositories with the following coordinates:
Then, usage is as simple as setting parameters through config of Execution Plan Configuration or setting corresponding resources of JDLAdaptor:
scope # /gcube/devsec jdl # test/jdlExamples/jdlExample.jdl chokeProgressEvents # false chokePerformanceEvents # false queueSupport # true utilization # 0.1 passedBy # 5 storePlans # true