Difference between revisions of "Statistical Algorithms Importer: FAQ"

From Gcube Wiki
Jump to: navigation, search
(Project Name)
 
(48 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
|}
 
|}
  
F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.
+
F.A.Q. of [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]], here are common mistakes we have found.
  
== In some cases, an algorithm worked in R Studio but did not work via SAI ==
+
== Project Type FAQ ==
  
This kind of issue is usually related to the production of the output files:
+
* [[Statistical Algorithms Importer: R Project FAQ|R Project FAQ]]
 +
* [[Statistical Algorithms Importer: R-blackbox Project FAQ|R-blackbox Project FAQ]]
 +
* [[Statistical Algorithms Importer: Java Project FAQ|Java Project FAQ]]
 +
* [[Statistical Algorithms Importer: Linux-compiled Project FAQ|Linux-compiled Project FAQ]]
 +
* [[Statistical Algorithms Importer: Python Project FAQ|Python Project FAQ]]
 +
* [[Statistical Algorithms Importer: Pre-Installed Project FAQ|Pre-Installed Project FAQ]]
  
*The file was produced in a subfolder, but is was declared to be in the root folder. E.g. the file output.zip was produced in the ./data folder by the process, but in SAI the variable referring to the output was declared as
+
== Installed Software ==
 +
A list of pre-installed software on the infrastructure machines is available at this page:
 +
* [[Pre Installed Packages|Pre Installed Packages]]
  
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
In general it is better to specify the packages with the relative versions as they shown in the previous link.
output<-"output.zip"
+
However, if you do not specify the packages the system tries to integrate and run the code using the packages already present on the DataMiner, this is done to facilitate the integration of the developers.
</pre>
+
Obviously, in this case if the process uses non-installed packages it will fail during the execution and it will be the developer's responsibility to request the installation of the missing packages.
:Thus with no ./data indicated in the file name
+
The Interpreter version also serves to better identify the type of code being executed and to support the entire debugging phase in the event of problems.
 +
So in general, algorithm support will be better if the information generated is greater, but in any case the system tries to integrate and execute the code.
  
*A forced switch of the working folder was done inside the code, which mislead the service about the produced file. E.g.:
+
== Project Folder ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
It is important that each algorithm has its own project folder. The project folder keeps the code created by the developer, so it is important that each algorithm has its own project folder, different for each algorithm. Once an algorithm is published, the Project Folder will contain the executable that will be requested by the DataMiner for execution, so it is important to avoid deleting published projects. Deleting a project means to establish to make it unavailable for use in the infrastructure.
output<-"output.zip"
+
setwd("./data")
+
save(output)
+
</pre>
+
:switch of the working folder inside the script should be generally avoided.
+
  
*A process tried to overwrite another file that had already been produced on the processing machine, but which was corrupted due to an update of the machine. This conflicted with the newly generated files.
+
== Project Name ==
 +
The project name cannot contain special characters, only letters and numbers are allowed, moreover any spaces can be replaced by the underscore character. Each project must have its own name different from that used in other projects.
  
:Generally, files with new names should be generated by a script that is being transformed into a web service. Generating output files with new names prevents errors due to several concurrent requests creating the same files, when the requests are managed by the same machine.
+
Project names already used:
:For example, instead of declaring
+
* [[DataMiner_Project_Names_Already_Used|DataMiner Project Names Already Used]]
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
zip_namefile <- "data_frame_result_query.zip"
+
</pre>
+
:The timestamp should be added to the generated file:
+
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
zip_namefile_random <- paste("data_frame_result_query_",Sys.time(),".zip",sep="")
+
zip_namefile <- zip_namefile_random
+
</pre>
+
  
== An algorithm does not receive input from the interface ==
+
== Project Configuration ==
DataMiner searches in the code for the declared default value and then it substitutes this with the user provides through the interface.
+
The SAI uses two project configuration files:
 +
* stat_algo.project
 +
* Main.R
 +
It is advisable that these files are never deleted or modified directly.
  
:This means that the default value in the code should correspond to the one declared in the annotations (and thus displayed in the Input/Output window) and vice versa. For example, if starting_point_latitude has -7.931 as declared default value, then DataMiner searches in the code for one of the following lines:
+
== Project ID ==
 +
Starting from the project name a unique identifier is associated to each project when it is published.
 +
The identifier allows the project to be recognized within the infrastructure.
 +
This is why it is important to give different names to each project and not to reuse the same name in different projects.
  
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
Where can find the Project Id? Just check the link associated with the algorithm name in DataMiner.
starting_point_latitude <- "-7.931"
+
[[Image:StatisticalAlgorithmsImporter_ProjectID.png|thumb|center|750px|Project ID, SAI]]
starting_point_latitude = "-7.931"
+
</pre>
+
  
:whereas, in the case of numeric variables, it searches for
+
== Parameters ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
It is important that an algorithm always has at least one input and one output parameter.
starting_point_latitude <- -7.931
+
All parameters are mandatory this is a design choice to support the repeatability and the reproducibility of the experiments, and also the reuse of algorithms.
starting_point_latitude = -7.931
+
In the case you want to include one optional file, it would be better to create two distinct algorithms, one that expects the file parameter and the other that does not.
</pre>
+
Of course you can use default value in the case of Strings, Integers, .... etcetera.
  
:Thus, if the initialization in the code is
+
== I don't see my algorithm in DataMiner ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
DataMiner portlets store algorithms in the user session, so if an algorithm is deployed but is not visible you must try to refresh the list of algorithms with the refresh button in the DataMiner.
starting_point_latitude <- "-13.548"
+
Remember, after the deploy a few minutes are needed to upgrade the system.
</pre>
+
:then DataMiner cannot find the default value to change. In other words, the default values in the code should correspond to the ones declared.
+
  
:We use this approach since SAI could be theoretically applied also to other programming languages than R, thus we do not rely on the R interpreter behind the scenes but on strings substitution using regular expressions.
+
== Publish an algorithm the first time ==
 +
The first time an algorithm is created, it must be published using the Publish button in the current VRE. After the first publication, both Repackage and Publish can be used.
 +
In the case that the Input and Output parameters are changed then it is necessary to reuse the Publish.
  
== Managing Enumerated Types - Creating drop-down menus ==
+
== Publish in another VRE ==
In order to create drop-down menus from SAI, containing enumerated choices, the screenshot show the process:
+
Sometimes we want to publish an algorithm in another VRE, different from the one in which we have already published the algorithm.
 +
If the SAI is present in the new VRE, just open the algorithm in the new VRE and publish it, otherwise you can open a ticket and you can report the VRE and the name of the algorithm that you want to publish.
  
[[Image:StatisticalAlgorithmsImporter_FAQ_Enumerated.png|thumb|center|900px|Enumerated, SAI]]
+
== Delete an algorithm ==
 +
To delete an algorithm published through the SAI it is necessary to open a ticket. The name of the algorithm and the list of VREs in which it was published must be written in the ticket.
  
#Declare a variable with a default value, e.g. enumerated<-"a"
+
== Advanced Input ==
#Indicate this variable as an input of Enumerated type and add the other possible values, separated by the | symbol: a|b|c
+
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]
#The first choice should be the default value indicated in the code
+
  
== Managing Boolean values ==
+
== Update the status of a computation ==
A Boolean variable can be managed by SAI, but this requires a trick to make R properly communicate with Java and vice-versa. In fact, R has many ways to declare boolean variables.
+
It is possible to update the inner status of a computation by writing a status.txt file locally to the process [[Statistical Algorithms Importer: StatusUpdate| Updating the status of a computation]]
The screenshot shows how to use and declare a Boolean variable when integrating an algorithm, i.e.:
+
  
[[Image:StatisticalAlgorithmsImporter_FAQ_Boolean.png|thumb|center|900px|Boolean, SAI]]
+
== Docker Support ==
:if removeZero is the boolean variable, then these lines help Java modifying the R code:
+
:SAI and DataMiner support the execution of Docker images on D4Science, for more information see the wiki available at this page:
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
* [[Statistical Algorithms Importer: Docker Support|Statistical Algorithms Importer: Docker Support]]
false<-F
+
true<-T
+
removeZero<-false
+
</pre>
+
 
+
:Thus the default value of the variable will be false.
+
:Further, the Boolean variable should be declared as a Boolean input with default value false (or true), written in lower case.
+
:This declaration will generate a Boolean choice on the user interface.
+
 
+
== Best Practices to debug the code ==
+
In order to debug a code that is not working as supposed to do, the following approach can be useful:
+
 
+
Avoid switching of the working directory in the code, because this makes the code prone to errors, especially for the services that need to work on the output of the process.
+
In order to understand what's happening in the process, to do the following:
+
 
+
1 - add a cat() instructions all over the code, e.g. log the full path of the produced file
+
2 - log a check of existence for the file in the initial working directory
+
2 - add an erroneous command at the end of the code to force the generation of an error at the point you want to investigate
+
3 - repackage the code and then download and read the logs after the execution of the algorithm
+
 
+
As a general rule, you should generate an error if the file was not produced by the algorithm due to some error in the execution.
+
  
 
[[Category:Statistical Algorithms Importer]]
 
[[Category:Statistical Algorithms Importer]]

Latest revision as of 18:26, 24 March 2021

F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.

Project Type FAQ

Installed Software

A list of pre-installed software on the infrastructure machines is available at this page:

In general it is better to specify the packages with the relative versions as they shown in the previous link. However, if you do not specify the packages the system tries to integrate and run the code using the packages already present on the DataMiner, this is done to facilitate the integration of the developers. Obviously, in this case if the process uses non-installed packages it will fail during the execution and it will be the developer's responsibility to request the installation of the missing packages. The Interpreter version also serves to better identify the type of code being executed and to support the entire debugging phase in the event of problems. So in general, algorithm support will be better if the information generated is greater, but in any case the system tries to integrate and execute the code.

Project Folder

It is important that each algorithm has its own project folder. The project folder keeps the code created by the developer, so it is important that each algorithm has its own project folder, different for each algorithm. Once an algorithm is published, the Project Folder will contain the executable that will be requested by the DataMiner for execution, so it is important to avoid deleting published projects. Deleting a project means to establish to make it unavailable for use in the infrastructure.

Project Name

The project name cannot contain special characters, only letters and numbers are allowed, moreover any spaces can be replaced by the underscore character. Each project must have its own name different from that used in other projects.

Project names already used:

Project Configuration

The SAI uses two project configuration files:

  • stat_algo.project
  • Main.R

It is advisable that these files are never deleted or modified directly.

Project ID

Starting from the project name a unique identifier is associated to each project when it is published. The identifier allows the project to be recognized within the infrastructure. This is why it is important to give different names to each project and not to reuse the same name in different projects.

Where can find the Project Id? Just check the link associated with the algorithm name in DataMiner.

Project ID, SAI

Parameters

It is important that an algorithm always has at least one input and one output parameter. All parameters are mandatory this is a design choice to support the repeatability and the reproducibility of the experiments, and also the reuse of algorithms. In the case you want to include one optional file, it would be better to create two distinct algorithms, one that expects the file parameter and the other that does not. Of course you can use default value in the case of Strings, Integers, .... etcetera.

I don't see my algorithm in DataMiner

DataMiner portlets store algorithms in the user session, so if an algorithm is deployed but is not visible you must try to refresh the list of algorithms with the refresh button in the DataMiner. Remember, after the deploy a few minutes are needed to upgrade the system.

Publish an algorithm the first time

The first time an algorithm is created, it must be published using the Publish button in the current VRE. After the first publication, both Repackage and Publish can be used. In the case that the Input and Output parameters are changed then it is necessary to reuse the Publish.

Publish in another VRE

Sometimes we want to publish an algorithm in another VRE, different from the one in which we have already published the algorithm. If the SAI is present in the new VRE, just open the algorithm in the new VRE and publish it, otherwise you can open a ticket and you can report the VRE and the name of the algorithm that you want to publish.

Delete an algorithm

To delete an algorithm published through the SAI it is necessary to open a ticket. The name of the algorithm and the list of VREs in which it was published must be written in the ticket.

Advanced Input

It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the Advanced Input

Update the status of a computation

It is possible to update the inner status of a computation by writing a status.txt file locally to the process Updating the status of a computation

Docker Support

SAI and DataMiner support the execution of Docker images on D4Science, for more information see the wiki available at this page: