How to use the Data Miner Pool Manager
Contents
DataMiner Pool Manager
DataMiner Pool Manager service, aka DMPM, is a REST service able to rationalize and automatize the current process for publishing SAI algorithms on DataMiner nodes and keep DataMiner cluster updated.
Overview
The service may accept an algorithm descriptor, including its dependencies (either OS, R and custom packages), queries the IS for dataminers in the current scope, generates (via templating) ansible playbook, inventory and roles for relevant stuff (algorithm installer, algorithms, dependencies), executes ansible playbook on a DataMiner.
In such sense, the service accepts as input, among the others, the url of an algorithm package (including jar, and metadata), extracts the information needed to installation, installs the script and returns asynchronously the execution outcome to the caller.
Testing
DMPM is a SmartGear compliant service. In such sense, an instance has already been deployed and configured at Development level.
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/
In order to allow Ansible to access the DataMiner, it is necessary that the SSH key of the host where the Service is deployed is correctly configured at DataMiner host level.
Requirements
The dependencies in the metadata file inside the algorithm package, must respect the following guidelines:
- R Dependencies must have prefix cran:
- OS Dependencies must have prefix os:
- Custom Dependencies must have prefix github:
In case no prefix is specified, the service considers such dependencies as OS ones.
Usage and APIs
Currently the service exposes the following REST methods:
- Adding An Algorithm to DataMiner
Such functionality installs the Algorithm on the specific DataMiner and returns immediately the log ID useful to monitor the execution.
addAlgorithmToHost(algorithm, hostname, name, description, category, algorithmType, skipJava);
@GET @Path("/hosts/add") @Produces("text/plain") public String addAlgorithmToHost( @QueryParam("algorithm") String algorithm, @QueryParam("hostname") String hostname, @QueryParam("name") String name, @QueryParam("description") String description, @QueryParam("category") String category, @QueryParam("algorithmType") String algorithmType, @QueryParam("skipJava") String skipJava) throws IOException, InterruptedException { Algorithm algo= this.getAlgorithm(algorithm, null, hostname, name, description, category, algorithmType, skipJava); //service.addAlgToIs(algo); return service.addAlgorithmToHost(algo, hostname); }
It is possible to distinguish among mandatories parameters and optional ones:
- Mandatories:
- algorithm: URL related the package of the Algorithm; such parameter is mandatory.
- hostname: the hostname of the DataMiner on which deploy the script; such parameter is mandatory.
- Optionals (The overall set of parameters, except the mandatory ones. can be extract from the metadata file (where available), or overwritten by the caller):
- name: name of the Algorithm (e.g.,ICHTHYOP_MODEL_ONE_BY_ONE )
- description: description of the Algorithm
- category: category to which the Algorithm belongs to (e.g, ICHTHYOP_MODEL)
- algorithmType: by default set to "transducerers"
- skipJava: by default set to "N"
An example of the usage is the following:
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/hosts/add?gcube-token=TOKEN_ID&algorithm=URL_TP_ALGORITHM&hostname=TARGET_DATAMINER
- Monitoring the execution
Such functionality allows the caller to monitor asynchronously the execution by using the log ID obtained when an algorithm is deployed.
getLogById(logID);
@GET @Path("/log") @Produces("text/plain") public String getLogById(@QueryParam("logUrl") String logUrl) throws IOException { // TODO Auto-generated method stub LOGGER.debug("Returning Log =" + logUrl); return service.getScriptFromURL(service.getURLfromWorkerLog(logUrl)); }
An example of the usage is the following:
http://node2-d-d4s.d4science.org:8080/dataminer-pool-manager-1.0.0-SNAPSHOT/rest/log?gcube-token=TOKEN_ID&logUrl=LOG_ID
Next Steps
- Add a functionality able to automatically retrieve the set of dataMiners in a cluster from the HA proxy, in order to allow the deploy of an algorithm to the set of DataMiners available in a particular VRE (e.g., RProtoLab)
- Add a functionality able to update the svn lists related to dependencies (if a dependency is present in a package, but not in the SVN list, the SVN will be udapted with the missing ones).
- Add an optional functionality able to register an algorithm in the IS (VRE scope)