Workflow Engine

From Gcube Wiki
Revision as of 11:08, 3 June 2010 by Giorgos.papanikos (Talk | contribs) (JDL Adaptor resource file syntax)

Jump to: navigation, search

Overview

The Workflow Engine operates on top of the ExecutionEngine. Its purpose is to abstract over the low level details that are needed by the ExecutionEngine and the Execution Plan it is provided with.

Execution Environment

The Workflow Engine is constructed to operate on a variety of environments. What is required for this to happen is to be able to identify hooks with which to attach it self to the available Information Space as well as access to permanent storage. These hooks are provided through the the Execution Environment Providers.

gCube Web Service Interface

When the Workflow Engine is acting in the context of the gCube platform, it provides a gCube compliant Web Service interface. This Web Service acts as the front end not only to Workflow definition facilities, but it is also the "face" of the component with respect to the gCube platform. The Running Instance profile of the service is the placeholder where the underlying Workflow Engine instance [Execution Environment Providers | environment provider]] pushes information that need to be made available to other engine instances. Additionally, the service etc folder contains the logging properties that control the logging granularity of the service as well as the rest of the underlying components. Configuration properties that are used throughout the Workflow Engine instance are retrieved from the service jndi and are used to initiate services and providers once the service receives the init event.

Adaptors

One of the functionalities offered by the WorkflowEngine is the possibility to bridge between existing well known job description and workflow definition languages and the internally used Workflow Language which is subsequently transformed in an Execution Plan. This bridging is performed by means of Adaptors.

Adaptors are implemented to operate on a specific third party language which they can understand, parse, and translate into internally used constructs. This way the WorkflowEngine opens up its usability level since existing workflows already defined in third party languages can be easily incorporated. Additionally, the learning curve for anyone wishing to use the execution and workflow capabilities of the system is greatly reduced as depending on ones needs one can simply focus on one of the existing supported languages which will best match the job at hand. Additionally, for the same language, more than one adaptors can be implemented that will offer different type of functionality either by modifying the semantics of the produced Execution Plan or even by incorporating external components to the evaluation.

The following list of adaptors is currently provided:

Adaptor CLI

A Command Line Interface is provided to define jobs to be submitted to one of the adaptors, monitor its execution as well as retrieve the output of the processing. These utilities are provided as part of the package WorkflowEngineServiceClient from the central distribution site.

This package includes the following utilities:

  • StartJDLAdaptorTest.sh
  • StartGridAdaptorTest.sh
  • StartHadoopAdaptorTest.sh
  • StartJobMonitor.sh
  • RetrieveFile.sh
  • Example files

The first three utilities, StartJDLAdaptor, StartGridAdaptor and StartHadoopAdaptor are the ones responsible of submitting the work to the respective adaptor as they are documented in the Adaptors section. All of them define the same interface expecting as first parameter a file locally available that describes the job that is to be submitted, and as a second argument a local file to be used as an output location where a reference identifier to the submitted job is stored to be used for later interactions. Depending on the adaptor used, a different syntax is needed for the resource file. The syntax for each of these are described in the following sections

JDL Adaptor resource file syntax

The syntax of the resource file expected by the JDL adaptor test utility is the one described bellow.

# The scope of the job submitted. Scope is an internal gCube construct described elsewhere. 
# For the purpose of this clients, one should just keep in mind that the scope used as a 
# value here must be one of the supported scopes defined in the gHN container installation
# available in the same machine as the one that is running the clients and to which the defined
# $GLOBUS_LOCATION environment variable points to
 
scope#<the scope to use>
 
# The jdl based description of the job that should be available in the local machine running
# this clients. A definition of the JDL syntax is out of scope but the supported attributes
# of the adaptor can be found at [[WorkflowJDLAdaptor]]
 
jdl#<path to the jdl file>
 
# While the job is running, the [[ExecutionEngine]] emits events about the progress of the execution
# as described in [[ExecutionEngine#Execution_Events | Execution Events]]. Using this flag one
# can choose to choke these events so that they are not emitted at all.
 
chokeProgressEvents#<true | false>
 
# While the job is running, the [[ExecutionEngine]] emits events about the performance of the execution
# as described in [[ExecutionEngine#Execution_Events | Execution Events]]. Using this flag one
# can choose to choke these events so that they are not emitted at all.
 
chokePerformanceEvents#<true | false>
 
# The testing utilities do not create an [[ExecutionEngine#Execution_Plan | Execution Plan]] but
# simply contacts the [[WorkflowEngine#gCube_Web_Service_Interface | Workflow Engine Service]]
# and the respective adaptor where the plan is created. Setting this flag, one can request
# that the created plan is retrieved and stored locally. Note that this feature may be unavailable 
# depending on the version used as the respective functionality is moved to another utility
 
storePlans#<true | false>
 
# The resources mentioned in the provided jdl and are either inputs or executables not already
# available in the host machines, must be made available to the adaptors so that they can be moved
# to the execution location. They are declared in the resource file using the name by which they 
# appear in the jdl. In order to be transfered to the execution location, they must first be made 
# available and accessible for the rest of the platform. This can be done by either providing the full
# payload of the resource as an in message blob using the "local" key, or they can already be available 
# in the gCube content management system using the "ss" key, or they can be accessible through an ftp 
# or http url using the "url" key. Depending on the way the resource is declared availalbe, the respective
# identifier must be provided. As many of these entries needed can be declared
 
<name of resource as mentioned in jdl>#<local | ss | url depending on where to access the payload from>#<the path / id / url to retrieve the paylaod from>
<name of resource as mentioned in jdl>#<local | ss | url depending on where to access the payload from>#<the paath / id / url to retrieve the paylaod from>

Examples of the above syntax can be found in the testing utility package.

Grid Adaptor resource file syntax

Hadoop Adaptor resource file syntax

Workflow Language