Difference between revisions of "How to use the Data Miner Pool Manager"

From Gcube Wiki
Jump to: navigation, search
 
(63 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:User's Guide]]
+
=DataMiner Pool Manager=
 +
DataMiner Pool Manager service, aka DMPM, is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.
  
=Data Miner Pool Manager=
+
==Overview==
DMPM is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.
+
  
The service is now able to:
+
The service may accept an algorithm descriptor, including its dependencies (either OS, R and custom packages), queries the IS for dataminers in the current scope, generates (via templating) ansible playbook, inventory and roles for relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a DataMiner.
- accept an algorithm descriptor, including its dependencies (either os, r and custom packages)
+
- query the IS for dataminers in the current scope
+
- generate (via templating) ansible playbook, inventory and roles for relevant stuff (algorithm installer, algorithms, dependencies)
+
- execute ansible playbook on a remote machine hosted at eng (no dataminer there; no need for that at this stage)
+
- roles/templates for the installation of the algorithm installer and installation of os packages are working fine
+
  
Next steps are (list is not exhaustive):
+
In such sense, the service accepts as input, among the others, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, update the list of dependencies, optionally publishs the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.
- accept as input the url of an algorithm package (including jar, and metadata)
+
- complete the role for algorithm installation (i.e. the one wrapping 'addAlgorithm.sh')
+
- execute the playbook on a machine running a dataminer
+
- return execution outcome to the caller
+
  
 +
== Maven coordinates ==
 +
The maven artifact coordinates are:
 +
<dependency>
 +
    <groupId>org.gcube.dataanalysis</groupId>
 +
    <artifactId>dataminer-pool-manager</artifactId>
 +
    <version>1.0.0-SNAPSHOT</version>
 +
    <packaging>war</packaging>
 +
</dependency>
  
Completed features:
+
==Configuration and Testing==
- accept as input the url of an algorithm package (including jar, and metadata)
+
- ansible role for algorithm installation (i.e. the one wrapping 'addAlgorithm.sh')
+
- support for custom ansible roles (e.g. for CRAN & custom packages)
+
  
Ongoing features:
+
DMPM is a SmartGears compliant service.
- 'smartgear-isation' of the service is ongoing
+
  
One-week from now:
+
<source lang="text">
- smartgear-isation of the service
+
/home/gcube/SmartGears-Bundle/tomcat/webapps/dataminer-pool-manager-1.0.0-SNAPSHOT
- return (asynchronously) execution outcome to the caller
+
</source>
  
 +
In such sense, an instance has already been deployed and configured at Development level.
  
==API==
+
<source lang="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/
 +
</source>
 +
 
 +
Such environment contains the configurations for ansible playbook, inventory and roles for algorithm installer, scripts, algorithms, dependencies and the logs of the execution.
 +
 
 +
<source lang="text">
 +
/home/gcube/gcube/dataminer-pool-manager/custom      // static directory
 +
/home/gcube/gcube/dataminer-pool-manager/static      // static directory
 +
/home/gcube/gcube/dataminer-pool-manager/templates  // static directory
 +
/home/gcube/gcube/dataminer-pool-manager/work        // dinamically generated directory
 +
</source>
 +
 
 +
In order to allow Ansible to access the DataMiner, it is necessary that the SSH key of the host where the Service is deployed is correctly configured at DataMiner host level.
 +
 
 +
==Requirements==
 +
 
 +
The dependencies in the metadata file inside the algorithm package, must respect the following guidelines:
 +
 
 +
* R Dependencies must have prefix '''cran:'''
 +
* OS Dependencies must have prefix '''os:'''
 +
* Custom Dependencies must have prefix '''github:'''
 +
 
 +
In case no prefix is specified, the service considers such dependencies as OS ones.
 +
 
 +
==Usage and APIs==
 
Currently the service exposes the following REST methods:
 
Currently the service exposes the following REST methods:
  
 +
* '''Adding an Algorithm to a single DataMiner host'''
 +
 +
Such functionality installs the Algorithm on the specific DataMiner, update the SVN list of dependencies, optionally stores the algorithm in the Information System and returns immediately the log ID useful to monitor the execution.
 +
 +
<source lang="java">
 +
addAlgorithmToHost(algorithm, hostname, name, description, category, algorithmType, skipJava, publish, updateSVN);
 +
</source>
 +
 +
 +
<source lang="java">
 +
        @GET
 +
@Path("/hosts/add")
 +
@Produces("text/plain")
 +
public String addAlgorithmToHost(
 +
@QueryParam("algorithm") String algorithm,
 +
@QueryParam("hostname") String hostname,
 +
@QueryParam("name") String name,
 +
@QueryParam("description") String description,
 +
@QueryParam("category") String category,
 +
@DefaultValue("transducerers") @QueryParam("algorithmType") String algorithmType,
 +
@DefaultValue("N") @QueryParam("skipJava") String skipJava,
 +
@DefaultValue("false") @QueryParam("publish") boolean publish,
 +
@DefaultValue("false") @QueryParam("updateSVN") boolean updateSVN)
 +
throws IOException, InterruptedException, SVNException {
 +
Algorithm algo = this.getAlgorithm(algorithm, null, hostname, name, description, category, algorithmType,
 +
skipJava);
 +
// publish algo
 +
 +
if (publish) {
 +
service.addAlgToIs(algo);
 +
}
 +
 +
return service.addAlgorithmToHost(algo, hostname, updateSVN);
 +
}
 +
</source>
 +
 +
 +
It is possible to distinguish among mandatories parameters and optional ones:
 +
 +
* Mandatories:
 +
** '''algorithm''': URL related the package of the Algorithm; such parameter is mandatory.
 +
** '''hostname''': the hostname of the DataMiner on which deploy the script; such parameter is mandatory.
 +
 +
* Optionals (The overall set of parameters, except the mandatory ones. can be extract from the metadata file (where available), or overwritten by the caller):
 +
** '''name''': the name of the Algorithm (e.g.,ICHTHYOP_MODEL_ONE_BY_ONE )
 +
** '''description''': the description of the Algorithm
 +
** '''category''': the category to which the Algorithm belongs to (e.g, ICHTHYOP_MODEL)
 +
** '''algorithmType''': by default set to "transducerers"
 +
** '''skipJava''': by default set to "N"
 +
** '''publish''': by default set to "false"; the registration of the algorithm in the VRE is currently done both by the service and by the install script (addAlgorithm.sh); by the way, if set to true, this paramater forces the registration in the IS of the deployed algorithm at caller scope level. The algorithm will be registered as Generic Resource.
 +
** '''updateSVN''': by default set to "false"; If the package contains dependencies not present in the SVN list of R/OS/GitHub Packages to be installed, the caller can set such parameter to true and they will be added to such lists in the related files by using the default svn credentials of the caller (~/.subversion folder). The update happens only at the end of the process, when the algorithm has been successfully deployed
 +
 +
An example of the usage is the following:
 +
<source lang ="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/hosts/add?gcube-token=TOKEN_ID&algorithm=URL_TP_ALGORITHM&hostname=TARGET_DATAMINER
 +
</source>
 +
 +
 +
 +
* '''Adding an Algorithm to the set of DataMiners available in the VRE'''
 +
 +
Such functionality installs the Algorithm on the set of dataminers of a particular VRE, update the SVN list of dependencies, optionally stores the algorithm in the Information System and returns immediately the log ID useful to monitor the execution.
 +
It's possible to automatically retrieve the set of dataminers in a cluster from the HA proxy. In fact, this exposes its status in csv format where we can retrieve the association between clusters and dataminers. For this to be effective, the name of the cluster in the HAProxy has been normalized with the name of dataminer/haproxy in the VRE. By relying on this, given a VRE (e.g. RProtoLab), the service will be able to deploy an algorithm to all dataminers belonging to it.
 +
 +
<source lang="java">
 +
addAlgorithmToVRE(algorithm, name, description, category, algorithmType, skipJava, publish, updateSVN);
 +
</source>
 +
 +
 +
<source lang="java">
 +
        @GET
 +
@Path("/scopes/add")
 +
@Produces("text/plain")
 +
public String addAlgorithmToVRE(
 +
@QueryParam("algorithm") String algorithm,
 +
@QueryParam("name") String name,
 +
@QueryParam("description") String description,
 +
@QueryParam("category") String category,
 +
@DefaultValue("transducerers") @QueryParam("algorithmType") String algorithmType,
 +
@DefaultValue("N") @QueryParam("skipJava") String skipJava,
 +
@DefaultValue("false") @QueryParam("publish") boolean publish,
 +
@DefaultValue("false") @QueryParam("updateSVN") boolean updateSVN)
 +
throws IOException, InterruptedException, SVNException {
 +
Algorithm algo = this.getAlgorithm(algorithm, /*vre*/null, null, name, description, category, algorithmType, skipJava);
 +
 +
// publish algo
 +
if (publish) {
 +
service.addAlgToIs(algo);
 +
}
 +
 +
return service.addAlgorithmToVRE(algo, ScopeProvider.instance.get(),updateSVN);
 +
}
 +
</source>
 +
 +
 +
It is possible to distinguish among mandatories parameters and optional ones:
 +
 +
* Mandatories:
 +
** '''algorithm''': URL related the package of the Algorithm; such parameter is mandatory.
 +
 +
* Optionals (The overall set of parameters, except the mandatory ones. can be extract from the metadata file (where available), or overwritten by the caller):
 +
** '''name''': the name of the Algorithm (e.g.,ICHTHYOP_MODEL_ONE_BY_ONE )
 +
** '''description''': the description of the Algorithm
 +
** '''category''': the category to which the Algorithm belongs to (e.g, ICHTHYOP_MODEL)
 +
** '''algorithmType''': by default set to "transducerers"
 +
** '''skipJava''': by default set to "N"
 +
** '''publish''': by default set to "false"; the registration of the algorithm in the VRE is currently done both by the service and by the install script (addAlgorithm.sh); by the way, if set to true, this paramater forces the registration in the IS of the deployed algorithm at caller scope level. The algorithm will be registered as Generic Resource.
 +
** '''updateSVN''': by default set to "false"; If the package contains dependencies not present in the SVN list of R/OS/GitHub Packages to be installed, the caller can set such parameter to true and they will be added to such lists in the related files by using the default svn credentials of the caller (~/.subversion folder). The update happens only at the end of the process, when the algorithm has been successfully deployed
 +
 +
An example of the usage is the following:
 +
<source lang ="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/hosts/add?gcube-token=TOKEN_ID&algorithm=URL_TP_ALGORITHM
 +
</source>
  
-- "addAlgorithmToVRE" returning immediatly the ID of the log
 
  
skipjava
+
* '''Monitoring the execution'''
...
+
  
 +
Such functionality allows the caller to monitor asynchronously the execution by using the log ID obtained when an algorithm is deployed.
  
-- "getLogById" returning asynchronously the detail of the execution
+
<source lang="java">
 +
getLogById(logID);
 +
</source>
  
==Usage==
 
  
==Testing==
+
<source lang="java">
DMPM is a SmartGear compliant service.
+
  @GET
An instance has already been deployed and configured at Development level.
+
  @Path("/log")
 +
  @Produces("text/plain")
 +
  public String getLogById(@QueryParam("logUrl") String logUrl) throws IOException {
 +
      // TODO Auto-generated method stub
 +
      LOGGER.debug("Returning Log =" + logUrl);
 +
      return service.getScriptFromURL(service.getURLfromWorkerLog(logUrl));
 +
  }
 +
</source>
  
http://node2-d-d4s.d4science.org:8080
 
  
In order to allow Ansible to access the DataMiner host it is necessary that the SSH key of the host where the Service is deployed is correctly configured at DataMiner host level.
+
An example of the usage is the following:
 +
<source lang="text">
 +
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/log?gcube-token=TOKEN_ID&logUrl=LOG_ID
 +
</source>

Latest revision as of 19:39, 22 April 2017

DataMiner Pool Manager

DataMiner Pool Manager service, aka DMPM, is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.

Overview

The service may accept an algorithm descriptor, including its dependencies (either OS, R and custom packages), queries the IS for dataminers in the current scope, generates (via templating) ansible playbook, inventory and roles for relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a DataMiner.

In such sense, the service accepts as input, among the others, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script, update the list of dependencies, optionally publishs the new algorithm in the Information System and returns asynchronously the execution outcome to the caller.

Maven coordinates

The maven artifact coordinates are:

<dependency>
   <groupId>org.gcube.dataanalysis</groupId>
   <artifactId>dataminer-pool-manager</artifactId>
   <version>1.0.0-SNAPSHOT</version> 
   <packaging>war</packaging>
</dependency>

Configuration and Testing

DMPM is a SmartGears compliant service.

/home/gcube/SmartGears-Bundle/tomcat/webapps/dataminer-pool-manager-1.0.0-SNAPSHOT

In such sense, an instance has already been deployed and configured at Development level.

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/

Such environment contains the configurations for ansible playbook, inventory and roles for algorithm installer, scripts, algorithms, dependencies and the logs of the execution.

/home/gcube/gcube/dataminer-pool-manager/custom      // static directory
/home/gcube/gcube/dataminer-pool-manager/static      // static directory
/home/gcube/gcube/dataminer-pool-manager/templates   // static directory
/home/gcube/gcube/dataminer-pool-manager/work        // dinamically generated directory

In order to allow Ansible to access the DataMiner, it is necessary that the SSH key of the host where the Service is deployed is correctly configured at DataMiner host level.

Requirements

The dependencies in the metadata file inside the algorithm package, must respect the following guidelines:

  • R Dependencies must have prefix cran:
  • OS Dependencies must have prefix os:
  • Custom Dependencies must have prefix github:

In case no prefix is specified, the service considers such dependencies as OS ones.

Usage and APIs

Currently the service exposes the following REST methods:

  • Adding an Algorithm to a single DataMiner host

Such functionality installs the Algorithm on the specific DataMiner, update the SVN list of dependencies, optionally stores the algorithm in the Information System and returns immediately the log ID useful to monitor the execution.

addAlgorithmToHost(algorithm, hostname, name, description, category, algorithmType, skipJava, publish, updateSVN);


        @GET
	@Path("/hosts/add")
	@Produces("text/plain")
	public String addAlgorithmToHost(
			@QueryParam("algorithm") String algorithm, 
			@QueryParam("hostname") String hostname,
			@QueryParam("name") String name,
			@QueryParam("description") String description,
			@QueryParam("category") String category,
			@DefaultValue("transducerers") @QueryParam("algorithmType") String algorithmType,
			@DefaultValue("N") @QueryParam("skipJava") String skipJava,
			@DefaultValue("false") @QueryParam("publish") boolean publish,
			@DefaultValue("false") @QueryParam("updateSVN") boolean updateSVN)
			throws IOException, InterruptedException, SVNException {
		Algorithm algo = this.getAlgorithm(algorithm, null, hostname, name, description, category, algorithmType,
				skipJava);
		// publish algo
 
		if (publish) {
			service.addAlgToIs(algo);
		}
 
		return service.addAlgorithmToHost(algo, hostname, updateSVN);
	}


It is possible to distinguish among mandatories parameters and optional ones:

  • Mandatories:
    • algorithm: URL related the package of the Algorithm; such parameter is mandatory.
    • hostname: the hostname of the DataMiner on which deploy the script; such parameter is mandatory.
  • Optionals (The overall set of parameters, except the mandatory ones. can be extract from the metadata file (where available), or overwritten by the caller):
    • name: the name of the Algorithm (e.g.,ICHTHYOP_MODEL_ONE_BY_ONE )
    • description: the description of the Algorithm
    • category: the category to which the Algorithm belongs to (e.g, ICHTHYOP_MODEL)
    • algorithmType: by default set to "transducerers"
    • skipJava: by default set to "N"
    • publish: by default set to "false"; the registration of the algorithm in the VRE is currently done both by the service and by the install script (addAlgorithm.sh); by the way, if set to true, this paramater forces the registration in the IS of the deployed algorithm at caller scope level. The algorithm will be registered as Generic Resource.
    • updateSVN: by default set to "false"; If the package contains dependencies not present in the SVN list of R/OS/GitHub Packages to be installed, the caller can set such parameter to true and they will be added to such lists in the related files by using the default svn credentials of the caller (~/.subversion folder). The update happens only at the end of the process, when the algorithm has been successfully deployed

An example of the usage is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/hosts/add?gcube-token=TOKEN_ID&algorithm=URL_TP_ALGORITHM&hostname=TARGET_DATAMINER


  • Adding an Algorithm to the set of DataMiners available in the VRE

Such functionality installs the Algorithm on the set of dataminers of a particular VRE, update the SVN list of dependencies, optionally stores the algorithm in the Information System and returns immediately the log ID useful to monitor the execution. It's possible to automatically retrieve the set of dataminers in a cluster from the HA proxy. In fact, this exposes its status in csv format where we can retrieve the association between clusters and dataminers. For this to be effective, the name of the cluster in the HAProxy has been normalized with the name of dataminer/haproxy in the VRE. By relying on this, given a VRE (e.g. RProtoLab), the service will be able to deploy an algorithm to all dataminers belonging to it.

addAlgorithmToVRE(algorithm, name, description, category, algorithmType, skipJava, publish, updateSVN);


        @GET
	@Path("/scopes/add")
	@Produces("text/plain")
	public String addAlgorithmToVRE(
			@QueryParam("algorithm") String algorithm, 
			@QueryParam("name") String name,
			@QueryParam("description") String description,
			@QueryParam("category") String category,
			@DefaultValue("transducerers") @QueryParam("algorithmType") String algorithmType,
			@DefaultValue("N") @QueryParam("skipJava") String skipJava,
			@DefaultValue("false") @QueryParam("publish") boolean publish,
			@DefaultValue("false") @QueryParam("updateSVN") boolean updateSVN)
			throws IOException, InterruptedException, SVNException {
		Algorithm algo = this.getAlgorithm(algorithm, /*vre*/null, null, name, description, category, algorithmType, skipJava);
 
		// publish algo
		if (publish) {
			service.addAlgToIs(algo);
		}
 
		return service.addAlgorithmToVRE(algo, ScopeProvider.instance.get(),updateSVN);
	}


It is possible to distinguish among mandatories parameters and optional ones:

  • Mandatories:
    • algorithm: URL related the package of the Algorithm; such parameter is mandatory.
  • Optionals (The overall set of parameters, except the mandatory ones. can be extract from the metadata file (where available), or overwritten by the caller):
    • name: the name of the Algorithm (e.g.,ICHTHYOP_MODEL_ONE_BY_ONE )
    • description: the description of the Algorithm
    • category: the category to which the Algorithm belongs to (e.g, ICHTHYOP_MODEL)
    • algorithmType: by default set to "transducerers"
    • skipJava: by default set to "N"
    • publish: by default set to "false"; the registration of the algorithm in the VRE is currently done both by the service and by the install script (addAlgorithm.sh); by the way, if set to true, this paramater forces the registration in the IS of the deployed algorithm at caller scope level. The algorithm will be registered as Generic Resource.
    • updateSVN: by default set to "false"; If the package contains dependencies not present in the SVN list of R/OS/GitHub Packages to be installed, the caller can set such parameter to true and they will be added to such lists in the related files by using the default svn credentials of the caller (~/.subversion folder). The update happens only at the end of the process, when the algorithm has been successfully deployed

An example of the usage is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/hosts/add?gcube-token=TOKEN_ID&algorithm=URL_TP_ALGORITHM


  • Monitoring the execution

Such functionality allows the caller to monitor asynchronously the execution by using the log ID obtained when an algorithm is deployed.

getLogById(logID);


   @GET
   @Path("/log")
   @Produces("text/plain")
   public String getLogById(@QueryParam("logUrl") String logUrl) throws IOException {
       // TODO Auto-generated method stub
       LOGGER.debug("Returning Log =" + logUrl);
       return service.getScriptFromURL(service.getURLfromWorkerLog(logUrl));
   }


An example of the usage is the following:

http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/log?gcube-token=TOKEN_ID&logUrl=LOG_ID