IR Bootstrapper

From Gcube Wiki
Jump to: navigation, search

The IR Bootstrapper portlet provides a graphical user interface for executing sets of tasks on various resources of the infrastructure after the data import phase is completed. These tasks lead to the creation of other resources such as:

  • Indexes
  • Open Search resources for open search collections
  • SRU resources for SRU collections

This portlet is based on a configuration file that is saved as a generic resource on the IS. This file is at XML format and it defines:

  • The available tasks that can be executed
  • The available jobTypes that can be used. These JobTypes define a sequential and/or parallel task executions for a given type of input to a given output
  • The available jobs which are of type of the available jobTypes and provide all the specific inputs for this type.
    • The jobs are the ones that are available for execution on the resources.
    • A job can be extended by another job to define a more restrict job to execute (i.e. to be defined for a collection with a given name)
    • The user can define a new job by using the portlet's graphical user interface


- An easy way to learn how to use the IRBootstrapper portlet is by watching this video: Execute tasks using the IRBootstrapper


Job Execution

The first tab of the portlet is divided into 2 main panels. At the left panel there is a tree with all the available collections. Clicking on a collection you can see all the jobs that can be executed on this collection. You can select any of the jobs and see the execution tree at the right panel. When a lock icon appears at a task of the selected job it means that this task is already completed for the selected collection and thus it won't be executed again. In order to execute this job you have to click on the IRPlayBtn.png button located on the top of the tree or you can check the checkbox and click on the IRBtn.png button. This button is enabled when at least one job is checked.

IRmain.png


Jobs Batch Submission

When you check more than one collections of the same job type, you can submit these jobs for batch execution. If these jobs require any extra user input at runtime a window appears asking for the extra input. The same input will be used for all the jobs that will be submitted using the batch mode, or the output of the previous task will be used as input in the next task

IR-BatchSelection.png IR-inputWindow.png

When a job is submitted it is added at the Submitted jobs tree. You can go at the Submitted Jobs panel to check the state of each job.

  • On each task an icon declares the current state: Running, Completed, Completed with warnings, Failed or Fulfilled Task
  • You can see the execution log of each task by clicking on the '+' button.
  • For each job you can abort the execution or you can remove it from the list

IRSubJobs2.png

Job Designer

The second tab shows a tree with all the job types and all the defined jobs for each type. You can delete an existing job, display the execution tree of a job and/or clone an existing job to a new one
These changes update the bootstrapper's portlet configuration generic resource.


IRdesigner.png
You can also create a new job using the graphical interface

  • The job should have a name and be of a specific type
  • For all the assignments of the specified type a value should be provided

IRcreateNewJob.png
Form more information about the jobs and jobTypes please refer to the section below

Bootstrapper Static Configuration

A static configuration in an XML format is required by the portlet in order to be initialized. This configuration is saved as a generic resource to the system's Information System. The configuration is created by the administrator when the portlet is released and can be enhanced later on using the portlet's Job Editor.

The configuration is consisted of 2 main parts:

  • Types
  • Jobs

Types

There are 3 different types that should be declared:

  • Type: It is added by the administrator when the resource is created and declares the classes defined in the portlet's source code

In the current implementation of the portlet the Data types that are defined are the following:

<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.TreeManagerCollectionDataType" name="TreeManagerCollection" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.GCUBECollectionDataType" name="GCUBECollection" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.OpenSearchDataType" name="OpenSearch" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.FullTextIndexNodeDataType" name="FullTextIndexNode" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUCollectionDataType" name="SRUCollection" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.SRUDataType" name="SRUResource" />
 
<type class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.data.RelationalDBDataType" name="RelationalDBDataSource" />
  • TaskType: It is added by the administrator when the resource is created and declares the tasks that can be executed using the portlet. For each task type the input and the output should be defined and the allowed values can be one of the available Data types described above.

In the current implementation of the portlet the tasks that can be executed are the following:

<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.OpenSearchGenerationTaskType" name="OpenSearchGenerationTaskType">
 
 <input type="GCUBECollection" />
 
 <output type="OpenSearch" />
 
 <run>true</run>      
 
</tasktype>
 
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeGenerationTaskType" name="FullTextIndexNodeGenerationTask">
 
 <input type="TreeManagerCollection" />
 
 <output type="FullTextIndexNode" />
 
 <run>true</run>
 
</tasktype>
 
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.FullTextIndexNodeUpdateTaskType" name="FullTextIndexNodeUpdateTask">
 
 <input type="TreeManagerCollection" />
 
 <output type="FullTextIndexNode" />
 
 <run>true</run>
 
</tasktype>
 
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.SRUGenerationTaskType" name="SRUGenerationTaskType">
 
 <input type="SRUCollection" />
 
 <output type="SRUResource" />
 
 <run>true</run>
 
</tasktype>
<tasktype class="org.gcube.portlets.admin.irbootstrapperportlet.gwt.server.types.task.RelationalDBFullTextIndexNodeGenerationTaskType" name="RelationalDBFullTextIndexNodeGenerationTask">
 
 <input type="RelationalDBDataSource" />
 
 <output type="FullTextIndexNode" />
 
 <run>true</run>
 
</tasktype>
  • JobType: The JobTypes are added by the administrator and define a set of task types and their initial assignments that will be executed. Furthermore for each task and for the task's internal assignments it defines if they will be executed in parallel or sequential order.

In this example the JobType creates FullText indexes. It takes as input a TreeManagerCollection and it will execute the FullTextIndexNodeGenerationTask task type that should be already defined in the configuration. It performs the required assignments for this task by providing the desired input and output. Notice here that these assignments will be run sequentially as defined in the XML.

<jobtype description="Creates the required fulltext indices for a collection." name="FTIndexNodeCollection">
 
 <input type="TreeManagerCollection" />
 
   <jobDefinition>
 
        <parallel>
 
             <sequential>
 
                 <assign to="%Create_ft_node_index.input" value="%FTIndexNodeCollection.input" />
 
                 <assign to="%Create_ft_node_index.output.IndexedCollectionID" value="%Create_ft_node_index.input.ColID" />
 
                 <task name="Create_ft_node_index" tasktype="FullTextIndexNodeGenerationTask" />
 
             </sequential>
 
         </parallel>
 
    </jobDefinition>
 
</jobtype>
  • You can define that instances of a specific JobType can be submitted in a batch mode. This means that multiple instances will be executed in a sequential order and that every instance (except the first one) will use a spesific output of the previous completed instance as its input.

In order to enable this functionality you should add the following element in the JobType definition (This element should be added as a child of the JobType element):

    <ChainExecution>
 
       <ChainConnectionAssignments>
 
          <assign to="%Create_ft_node_index.FullTextIndexGenerationTask.IdOfIndexManagerToAppend" value="%Create_ft_node_index.output.IndexID" />
 
       </ChainConnectionAssignments>
 
     </ChainExecution>

In the above assignment you declare that in a batch mode execution the specific assignment will take as value the specific output value of the previous job (i.e. the IndexID of a Full Text Index will be used as the Index Manager ID of the next Job. This will force both indexes to be created under the same WS-resource Index Manager).

Jobs

You can define as many jobs you want by using the portlet's visual Job Editor/Creator. The portlet helps the creation by suggesting the JobTypes and the assignments.
The type of each job should be one of the declared JobTypes and based on the type every job should declare the needed assignments. You can see the XML of job for the FTIndexNodeCollection JobType below:

<job jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections">
 
    <initialization>
 
           <assign to="%FTIndexNodeCollection.input.Type" value="ns5:OAI" />
 
           <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IndexTypeID" value="ft_oai_dc_1.0" />
 
           <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.TransformationXSLTID" value="$BrokerXSLT_wrapperFT" />
 
           <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.XsltsIDs" value="[ $BrokerXSLT_FARM_dc_anylanguage_to_ftRowset_anylanguage ]" />
 
           <assign to="%Create_ft_node_index.FullTextIndexNodeGenerationTask.IdOfIndexManagerToAppend" userInputLabel="ID of index node to append" value="%userInput" />
 
     </initialization>
 
</job>

In the above example the job has the name: FullText Index OAI Tree Collections and declares through the assignments that it can only be matched for the TreeManagerCollection with Type: "ns5:OAI" and it also defines all the other needed assignments needed for the Full Text Index creation. This job is applicable for all the OAI collections. If you need to declare the same job for other types the Type and the XsltsIDs should be updated.

  • It is worth noticing that if you want a job to ask for a user input at runtime then you should declare at the respective assignment the specific value: %userInput. In addition the attribute: userInputLabel should contain a description of the type of value you are expecting to be provided by the user. This will help the user to provide the expected value.


Jobs' Inheritance

You can create new Jobs by extending existing ones. In order to extend an existing Job you should add the extends attribute in the Job element:

<job extends="FullText Index OAI Tree Collections" jobtype="FTIndexNodeCollection" name="FullText Index OAI Tree Collections-Extended">

In the above example you declare that a new Job is created with name: "FullText Index OAI Tree Collections-Extended" that extends the existing Job: "FullText Index OAI Tree Collections".
Assignments that are already defined in the parent Job do not need to be defined again, unless you would like to override them