How to use the DataMiner Pool Manager

From Gcube Wiki
Revision as of 15:05, 5 September 2017 by Nunzioandreagalante (Talk | contribs)

Jump to: navigation, search

DataMiner Pool Manager

DataMiner Pool Manager service, aka DMPM, is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.

Maven coordinates

The second version of the the service has been released in gCube 4.6.1. The maven artifact coordinates are:

<dependency>
   <groupId>org.gcube.dataanalysis</groupId>
   <artifactId>dataminer-pool-manager</artifactId>
   <version>2.0.0-SNAPSHOT</version> 
   <packaging>war</packaging>
</dependency>

Overview

The service may accept an algorithm descriptor, including its dependencies, generates (via templating) ansible playbook, inventory and roles for the relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a Staging DataMiner, and finally udpdates the lists of dependendencies and algorithms that will be used from a Cron-job for the installation.

In such sense, the service accepts as input, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, updates the list of dependencies, publishes the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.

Architecture

The following main entities will be involved in the process of integration between SAI and the production environment:

  • SAI: such component allows the user to upload the Package related to the algorithm to deploy and to decide on which VRE
  • Dataminer Pool Manager: a Smartgears REST service in charge of managing the installation of algorithms on the infrastructure dataminers
  • The Staging DataMiner: a particular dataminer machine, usable only by the Dataminer Pool Manager, used to test the installation of an algorithm and to its dependencies. Two different dataminers in the d4science infrastructure are staging-oriented (such information can be set by the user inside the configuration file):
    • dataminer1-devnext.d4science.org for the development environment
    • dataminer-proto-ghost.d4science.org for the production environment
  • SVN Dependencies Lists: lists (in files on SVN) of dependencies that must be installed on Dataminer machines. There is one list for type of dependency both for Dev, RProto and Production.
  • SVN Algorithms List: lists (in files on SVN) of algorithms that must be installed on Dataminer machines. The service uses three different lists, one for the Dev environment, one for RProto and another one for the production.
  • The Cron job: runs on every Dataminer and periodically (every minute) aligns the packages and the algorithms installed on the machine with the SVN Dependencies List and the SVN Algorithms Lists. Concerning the Algorithms, The Cron Job should have to be configured to run the command line available as record of SVN list, while as far as the Dependencies concerns, the Cron Job should have to be configured in order to read and install from both the set of dependencies lists. The lists to consider are the following:
    • Production Algorithms:
   http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/prod/algorithms
    • RProto Algorithms:
   http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/proto/algorithms
    • Dev Algorithms:
  http://svn.research-infrastructures.eu/public/d4science/gcube/trunk/data-analysis/DataMinerConfiguration/algorithms/dev/algorithms

Process (From SAI to Production VRE)

Currently SAI is deployed in several scopes and the user may deploy the algorithm just in the actual VRE. The idea is to have just an instance of SAI in RPrototypingLab VRE and allow the user to specify the VRE by providing the token for that VRE. The Installation of the new algorithms by means of SAI involves the following input therefore:

  • Package containing Metadata and dependencies
  • The target VRE and the token to access it

The process is composed of two main phases:

  • TEST Phase: the installation of an algorithm and its dependencies in the staging dataminer; it ends with the publishing of an algorithm in the pool of dataminers of the RPrototypingLab VRE
    • The DMPM contacts the Staging Dataminer and installs the algorithm and the dependencies
    • The output is retrieved. If there are errors in the installation (e.g. a dependency that does not exist) it stops and the log is returned to the user
    • The DMPM updates the SVN RPrototypingLab Dependencies lists
    • The DMPM updates the SVN RPrototypingLab Algorithms list
    • Cron read the SVN lists (both Dependencies and Algorithms) and installs the algorithm only and the dependencies in RPrototypingLab dataminers.
    • The script publishes the new algorithm in RPrototypingLab VRE (if an algorithm is already available on the IS in that scope, the script updates the .jar files, but the resource on the IS, the .properties and the wps config do not change)

5.png

  • RELEASE Phase
    • SAI will invoke the service working in RELEASE PHASE in order to install the algorithm in a particular VRE of production (provided by the user); SAI will pass to the DMPM the target VRE name and the token to access to that VRE
    • The DMPM updates the SVN Production Dependencies lists
    • The DMPM updates the SVN Production Algorithms list
    • Cron installs the algorithm only and the dependencies in the production dataminers
    • The script publishes the algorithm in the VRE

4.png

Configuration and Testing

to do config:

-service.properties

-web.xml

DMPM is a SmartGears compliant service.

/home/gcube/tomcat/webapps/dataminer-pool-manager-2.0.0-SNAPSHOT

In such sense, an instance has already been deployed and configured at Development level.

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/rest/

Such environment contains the configurations for ansible playbook, inventory and roles for algorithm installer, scripts, algorithms, dependencies and the logs of the executions.

/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/static     // static resource inside the WAR containing static roles
/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/templates  // static resource inside the WAR containing the templates
/home/gcube/tomcat/webapps/dataminer-pool-manager/WEB-INF/classes/custom     // static resource inside the WAR containing the custom roles
/home/gcube/dataminer-pool-manager/dpmConfig/service.properties              // static resource on the filesystem containing configuration data
/home/gcube/dataminer-pool-manager/jobs                                      // dynamically generated resource concerning the logs of the different job executions
/home/gcube/dataminer-pool-manager/work                                      // dinamically generated resource concerning the Ansible worker for each job

In order to allow Ansible to work on the pool of DataMiners, is necessary that the SSH key of the VM on which the service run (e.g., node2-d-d4s.d4science.org) must be deployed on the pool of Staging dataminers with root and gcube permissions.

Usage and APIs

The DMPM REST Service will expose three main functionalities (one for the test phase, another one for the release phase, and a third one cross to both of them):

1. TEST PHASE: a method returning immediately the log ID useful to monitor the execution, able to:

    • test the installation of the algorithm and its dependencies on a staging dataminer
    • to update the SVN lists (both for dependencies and algorithms) dedicated to RPrototypingLab

The parameters to consider are the following:

  • the algorithm (URL to package containing the dependencies and the script to install)
  • the category to which the algorithm belong to
  • the VRE token from which SAI is used (ideally RPrototypingLab)

An example of Rest call is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/algorithm/stage?
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
&algorithmPackageURL=http://data.d4science.org/dENQTTMxdjNZcGRpK0NHd2pvU0owMFFzN0VWemw3Zy9HbWJQNStIS0N6Yz0
&category=ICHTHYOP_MODEL


2. RELEASE PHASE: a method invoked from SAI, executed after that the Test phase has successfully finished, able to:

    • update the SVN list of production with the dependencies extracted from the package (if new ones are present)
    • update the SVN list of production with the algorithm (if new one)

Some of the parameters to consider are the following:

  • the algorithm (URL to package containing the dependencies and the script to install)
  • the category to which the algorithm belong to
  • the VRE token from which SAI is used (ideally RPrototypingLab)
  • The target VRE on which install the algorithm
  • The token for the target VRE (before publishing the algorithm in the SVNRepository, the service check if the user is registered to the targetVRE)

An example of Rest call is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/api/algorithm/add?
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
&algorithmPackageURL=http://data.d4science.org/dENQTTMxdjNZcGRpK0NHd2pvU0owMFFzN0VWemw3Zy9HbWJQNStIS0N6Yz0
&category=ICHTHYOP_MODEL
&targetVREToken=3a23bfa4-4dfe-44fc-988f-194b91071dd2-843339462
&targetVRE=/d4science.research-infrastructures.eu/gCubeApps/RPrototypingLab


3. The result of the execution will be monitored asynchronously by means of a REST call to a log having as parameter the ID of the operation. This can be done both at TEST and RELEASE phases.

An example of Rest call is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-2.0.0-SNAPSHOT/api/log?
gcube-token=708e7eb8-11a7-4e9a-816b-c9ed7e7e99fe-98187548
&logUrl=426c8e35-a624-4710-b612-c90929c32c27


Notification

Requirements toward the SAI integration

The user allows SAI to generate the package. Each package generated by SAI must have a Info.txt metadata file having the following information specified by the user:

Algorithm Name, Author, Category, Class Name, Packages (list of dependencies)

  • The dependencies in the metadata file inside the algorithm package must respect the following guidelines:
    • R Dependencies must have prefix cran:
    • OS Dependencies must have prefix os:
    • Custom Dependencies must have prefix github:

Such dependencies will be stored in the correspondent SVN file without the prefix.

Three buttons will be available in the new SAI interface in order to allow the interaction among SAI and the three methods exposed by the Service.

  • On the host where the Service is deployed, must be possible to execute the ansible-playbook command, in order to allow the installation of the dependencies on the staging dataminer, and to install the algorithm on the target VRE
  • At least for the staging dataminers used in the test phase, the application must have SSH root access