Workflow Engine
Contents
Overview
The Workflow Engine operates on top of the ExecutionEngine. Its purpose is to abstract over the low level details that are needed by the ExecutionEngine and the Execution Plan it is provided with.
Execution Environment
The Workflow Engine is constructed to operate on a variety of environments. What is required for this to happen is to be able to identify hooks with which to attach it self to the available Information Space as well as access to permanent storage. These hooks are provided through the the Execution Environment Providers.
gCube Web Service Interface
When the Workflow Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service acts as the front end not only to Workflow definition facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Workflow Engine instance [Execution Environment Providers | environment provider]] pushes information that need to be made available to other engine instances. Additionally, the service etc folder contains the logging properties that control the logging granularity of the service as well as the rest of the underlying components. Configuration properties that are used throughout the Workflow Engine instance are retrieved from the service jndi and are used to initiate services and providers once the service receives the init event.
Adaptors
One of the functionalities offered by the WorkflowEngine is the possibility to bridge between existing well known job description and workflow definition languages and the internally used Workflow Language which is subsequently transformed in an Execution Plan. This bridging is performed by means of Adaptors.
Adaptors are implemented to operate on a specific third party language which they can understand, parse, and translate into internally used constructs. This way the WorkflowEngine opens up its usability level since existing workflows already defined in third party languages can be easily incorporated. Additionally, the learning curve for anyone wishing to use the execution and workflow capabilities of the system is greatly reduced as depending on ones needs one can simply focus on one of the existing supported languages which will best match the job at hand. Additionally, for the same language, more than one adaptors can be implemented that will offer different type of functionality either by modifying the semantics of the produced Execution Plan or even by incorporating external components to the evaluation.
The following list of adaptors is currently provided:
- WorkflowJDLAdaptor
- This adaptor parses a Job Description Language (JDL) definition block and translates the described job or DAG of jobs into an Execution Plan which can be submitted to the ExecutionEngine for execution.
- WorkflowGridAdaptor
- This adaptor constructs an Execution Plan that can contact a gLite UI node, submit, monitor and retrieve the output of a grid job.
- WorkflowHadoopAdaptor
- This adaptor constructs an Execution Plan that can contact a Hadoop UI node, submit, monitor and retrieve the output of a Map Reduce job.
Adaptor CLI
A Command Line Interface is provided to define jobs to be submitted to one of the adaptors, monitor its execution as well as retrieve the output of the processing. These utilities are provided as part of the package WorkflowEngineServiceClient from the central distribution site.
This package includes the following utilities:
- StartJDLAdaptorTest.sh
- StartGridAdaptorTest.sh
- StartHadoopAdaptorTest.sh
- StartJobMonitor.sh
- RetrieveFile.sh
- Example files
The first three utilities, StartJDLAdaptor, StartGridAdaptor and StartHadoopAdaptor are the ones responsible of submitting the work to the respective adaptor as they are documented in the Adaptors section. All of them define the same interface expecting as first parameter a file locally available that describes the job that is to be submitted, and as a second argument a local file to be used as an output location where a reference identifier to the submitted job is stored to be used for later interactions. Depending on the adaptor used, a different syntax is needed for the resource file. The syntax for each of these are described in the following sections
JDL Adaptor resource file syntax
The syntax of the resource file expected by the JDL adaptor test utility is the one described bellow.
#The scope of the job submitted. Scope is an internal gCube construct described elsewhere. #For the purpose of this clients, one should just keep in mind that the scope used as a #value here must be one of the supported scopes defined in the gHN container installation #available in the same machine as the one that is running the clients and to which the defined #$GLOBUS_LOCATION environment variable points to scope#<the scope to use> #The jdl based description of the job that should be available in the local machine running #this clients. A definition of the JDL syntax is out of scope but the supported attributes #of the adaptor can be found at [[WorkflowJDLAdaptor]] jdl#<path to the jdl file> #While the job is running, the [[ExecutionEngine]] emits events about the progress chokeProgressEvents : <true | false> (depending on whether you want to omit progress reporting) chokePerformanceEvents : <true | false> (depending on whether you want to omit performance reporting) storePlans : <true | false> (depending on whether you want the plan created and the final one to be stored for inspection) <name of resource as mentioned in jdl> : <local | ss | url depending on where to access the payload from> : <the path / id / url to retrieve the paylaod from> <name of resource as mentioned in jdl> : <local | ss | url depending on where to access the payload from> : <the paath / id / url to retrieve the paylaod from> [...]