Difference between revisions of "Statistical Algorithms Importer: FAQ"

From Gcube Wiki
Jump to: navigation, search
(An algorithm does not receive input from the interface)
(36 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
|}
 
|}
  
F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.
+
F.A.Q. of [[Statistical_Algorithms_Importer|Statistical Algorithms Importer (SAI)]], here are common mistakes we have found.
  
== In some cases, an algorithm worked in R Studio but did not work via SAI ==
+
== Project Type FAQ ==
  
This kind of issue is usually related to the production of the output files:
+
* [[Statistical Algorithms Importer: R Project FAQ|R Project FAQ]]
 +
* [[Statistical Algorithms Importer: R-blackbox Project FAQ|R-blackbox Project FAQ]]
 +
* [[Statistical Algorithms Importer: Java Project FAQ|Java Project FAQ]]
 +
* [[Statistical Algorithms Importer: Linux-compiled Project FAQ|Linux-compiled Project FAQ]]
 +
* [[Statistical Algorithms Importer: Python Project FAQ|Python Project FAQ]]
 +
* [[Statistical Algorithms Importer: Pre-Installed Project FAQ|Pre-Installed Project FAQ]]
  
*The file was produced in a subfolder, but is was declared to be in the root folder. E.g. the file output.zip was produced in the ./data folder by the process, but in SAI the variable referring to the output was declared as
+
== Installed Software ==
 +
A list of pre-installed software on the infrastructure machines is available at this page:
 +
* [[Pre Installed Packages|Pre Installed Packages]]
  
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
In general it is better to specify the packages with the relative versions as they shown in the previous link.
output<-"output.zip"
+
However, if you do not specify the packages the system tries to integrate and run the code using the packages already present on the DataMiner, this is done to facilitate the integration of the developers.
</pre>
+
Obviously, in this case if the process uses non-installed packages it will fail during the execution and it will be the developer's responsibility to request the installation of the missing packages.
:Thus with no ./data indicated in the file name
+
The Interpreter version also serves to better identify the type of code being executed and to support the entire debugging phase in the event of problems.
 +
So in general, algorithm support will be better if the information generated is greater, but in any case the system tries to integrate and execute the code.
  
*A forced switch of the working folder was done inside the code, which mislead the service about the produced file. E.g.:
+
== Project Folder ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
It is important that each algorithm has its own project folder. The project folder keeps the code created by the developer, so it is important that each algorithm has its own project folder, different for each algorithm. Once an algorithm is published, the Project Folder will contain the executable that will be requested by the DataMiner for execution, so it is important to avoid deleting published projects. Deleting a project means to establish to make it unavailable for use in the infrastructure.
output<-"output.zip"
+
setwd("./data")
+
save(output)
+
</pre>
+
:switch of the working folder inside the script should be generally avoided.
+
  
*A process tried to overwrite another file that had already been produced on the processing machine, but which was corrupted due to an update of the machine. This conflicted with the newly generated files.
+
== Project Name ==
 +
The project name cannot contain special characters, only letters and numbers are allowed, moreover any spaces can be replaced by the underscore character.
  
:Generally, files with new names should be generated by a script that is being transformed into a web service. Generating output files with new names prevents errors due to several concurrent requests creating the same files, when the requests are managed by the same machine.
+
== Project Configuration ==
:For example, instead of declaring
+
The SAI uses two project configuration files:  
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
* stat_algo.project
zip_namefile <- "data_frame_result_query.zip"
+
* Main.R
</pre>
+
It is advisable that these files are never deleted or modified directly.
:The timestamp should be added to the generated file:
+
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
zip_namefile_random <- paste("data_frame_result_query_",Sys.time(),".zip",sep="")
+
zip_namefile <- zip_namefile_random
+
</pre>
+
  
== An algorithm does not receive input from the interface ==
+
== Project ID ==
DataMiner searches in the code for the declared default value and then it substitutes this with the user provides through the interface.
+
Starting from the project name a unique identifier is associated to each project when it is published.
 +
The identifier allows the project to be recognized within the infrastructure.
 +
This is why it is important to give different names to each project and not to reuse the same name in different projects.
  
:This means that the default value in the code should correspond to the one declared in the annotations (and thus displayed in the Input/Output window) and vice versa. For example, if starting_point_latitude has -7.931 as declared default value, then DataMiner searches in the code for one of the following lines:
+
== Parameters ==
 +
It is important that an algorithm always has at least one input and an output parameter.
  
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
== I don't see my algorithm in DataMiner ==
starting_point_latitude <- "-7.931"
+
DataMiner portlets store algorithms in the user session, so if an algorithm is deployed but is not visible you must try to refresh the list of algorithms with the refresh button in the DataMiner.
starting_point_latitude = "-7.931"
+
Remember, after the deploy a few minutes are needed to upgrade the system.
</pre>
+
  
:whereas, in the case of numeric variables, it searches for
+
== Publish in another VRE ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
Sometimes we want to publish an algorithm in another VRE, different from the one in which we have already published the algorithm.
starting_point_latitude <- -7.931
+
If the SAI is present in the new VRE, just open the algorithm in the new VRE and publish it, otherwise you can open a ticket and you can report the VRE and the name of the algorithm that you want to publish.
starting_point_latitude = -7.931
+
</pre>
+
  
:Thus, if the initialization in the code is
+
== Advanced Input ==
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
+
It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the [[Advanced Input| Advanced Input ]]
starting_point_latitude <- "-13.548"
+
</pre>
+
:then StatMan cannot find the default value to change. In other words, the default values in the code should correspond to the ones declared.
+
  
:We use this approach since SAI could be theoretically applied also to other programming languages than R, thus we do not rely on the R interpreter behind the scenes but on strings substitution using regular expressions.
+
== Update the status of a computation ==
 +
It is possible to update the inner status of a computation by writing a status.txt file locally to the process [[Statistical Algorithms Importer: StatusUpdate| Updating the status of a computation]]
  
== Managing Enumerated Types - Creating drop-down menus ==
 
In order to create drop-down menus from SAI, containing enumerated choices, the screenshot show the process:
 
  
[[Image:StatisticalAlgorithmsImporter_FAQ_Enumerated.png|thumb|center|900px|Enumerated, SAI]]
 
 
#Declare a variable with a default value, e.g. enumerated<-"a"
 
#Indicate this variable as an input of Enumerated type and add the other possible values, separated by the | symbol: a|b|c
 
#The first choice should be the default value indicated in the code
 
 
== Managing Boolean values ==
 
A Boolean variable can be managed by SAI, but this requires a trick to make R properly communicate with Java and vice-versa. In fact, R has many ways to declare boolean variables.
 
The screenshot shows how to use and declare a Boolean variable when integrating an algorithm, i.e.:
 
 
[[Image:StatisticalAlgorithmsImporter_FAQ_Boolean.png|thumb|center|900px|Boolean, SAI]]
 
:if removeZero is the boolean variable, then these lines help Java modifying the R code:
 
<pre style="display:block;font-family:monospace;white-space:pre;margin:1em 2em;">
 
false<-F
 
true<-T
 
removeZero<-false
 
</pre>
 
 
:Thus the default value of the variable will be false.
 
:Further, the Boolean variable should be declared as a Boolean input with default value false (or true), written in lower case.
 
:This declaration will generate a Boolean choice on the user interface.
 
 
[[Category:Statistical Algorithms Importer]]
 
[[Category:Statistical Algorithms Importer]]

Revision as of 09:14, 16 July 2019

F.A.Q. of Statistical Algorithms Importer (SAI), here are common mistakes we have found.

Project Type FAQ

Installed Software

A list of pre-installed software on the infrastructure machines is available at this page:

In general it is better to specify the packages with the relative versions as they shown in the previous link. However, if you do not specify the packages the system tries to integrate and run the code using the packages already present on the DataMiner, this is done to facilitate the integration of the developers. Obviously, in this case if the process uses non-installed packages it will fail during the execution and it will be the developer's responsibility to request the installation of the missing packages. The Interpreter version also serves to better identify the type of code being executed and to support the entire debugging phase in the event of problems. So in general, algorithm support will be better if the information generated is greater, but in any case the system tries to integrate and execute the code.

Project Folder

It is important that each algorithm has its own project folder. The project folder keeps the code created by the developer, so it is important that each algorithm has its own project folder, different for each algorithm. Once an algorithm is published, the Project Folder will contain the executable that will be requested by the DataMiner for execution, so it is important to avoid deleting published projects. Deleting a project means to establish to make it unavailable for use in the infrastructure.

Project Name

The project name cannot contain special characters, only letters and numbers are allowed, moreover any spaces can be replaced by the underscore character.

Project Configuration

The SAI uses two project configuration files:

  • stat_algo.project
  • Main.R

It is advisable that these files are never deleted or modified directly.

Project ID

Starting from the project name a unique identifier is associated to each project when it is published. The identifier allows the project to be recognized within the infrastructure. This is why it is important to give different names to each project and not to reuse the same name in different projects.

Parameters

It is important that an algorithm always has at least one input and an output parameter.

I don't see my algorithm in DataMiner

DataMiner portlets store algorithms in the user session, so if an algorithm is deployed but is not visible you must try to refresh the list of algorithms with the refresh button in the DataMiner. Remember, after the deploy a few minutes are needed to upgrade the system.

Publish in another VRE

Sometimes we want to publish an algorithm in another VRE, different from the one in which we have already published the algorithm. If the SAI is present in the new VRE, just open the algorithm in the new VRE and publish it, otherwise you can open a ticket and you can report the VRE and the name of the algorithm that you want to publish.

Advanced Input

It is possible to indicate spatial inputs or time/date inputs. The details for the definition of these dare are reported in the Advanced Input

Update the status of a computation

It is possible to update the inner status of a computation by writing a status.txt file locally to the process Updating the status of a computation