Statistical Manager Tutorial

From Gcube Wiki
Jump to: navigation, search

Statistical Manager

This is a cross-usage service that provides users and services with tools for performing data mining operations. Specifically, it offers a unique access to perform data mining and statistical operations on heterogeneous data, which may reside either on the client side, in the form of comma-separated values files, or be remotely hosted, possibly in a database. The SM service is able to take inputs and execute the operation requested by a client by invoking the most suited computational facility from a set of available computational resources. Executions can run either on multi-core machines or on different computational platforms, such as D4Science and other different private and commercial Cloud providers.

The SM Service is a container of algorithms that are implemented as plug-ins based on the Dependency Injection programming pattern. These reside on SM and can be invoked by infrastructural or external clients according to a public Web Services Description Language interface. The requests are managed asynchronously, and the client can monitor the status of the computation at any time.

Upload a custom file

In the Access to the Data Space section click on the Importer tab. In the Data Set Importer form, a user can type the name he\she wants to assign to the file that will be imported.

Select the correct template for the file and click on Open CSV Importer Wizard.


The csv import wizard will be shown. The user has two possibilities to import files:

  • import files from local source;
  • import file from workspace;

According these, the user can select a flag and click on Next.

Saving files to import directly as UTF-8

Most text editors can handle UTF-8, although you might have to set them do this when loading and saving files. You may save a file as UTF-8, using MS Windows Notepad.

  1. Open Notepad
  2. File - Save as -> there you see 3 fields set the last one called "encoding" to: UTF-8

Alternatively, you may save a file using also Excel or Open Office (see next figures)

UtfExcel.png UTFOpenOf.png

Import a file from local source

  1. Click on Browse and select a file in a local source;
  2. Click on Upload and click Next;
  3. If the first row of the file is the header of the columns, check the Has header flag.
  4. Select the delimeter of columns, e.g. comma, or define it by yourself;
  5. Click on Check configuration;
  6. Click on Next;
  7. Click on Finish when the import is completed.

Import a file from workspace

  1. Click on Browse and select a file in the pop up window that shows your workspace files
  2. see steps [2-7] in the previous section.

CsvImport1.png CsvImport2.png CsvImport3.png

Execute an Experiment with the Statistical Manager and Collect the Results

In the Execute an Experiments section, you can find a list of algorithms grouped by category.

Click on the category in which the algorithm you want to execute is. A list of algorithms belonging to the category appears.


By clicking on the arrow next to the algorithm description, you can retrive the parameters of the chosen algorithm. You must fill in the parameters and click on the Start computation button.

Check the Status of the Computation

Once you have run an algorithm you can disconnect from the portal and check for its completion after a while.

In order to check the status of a computation, click on the "Check the computations" button. The list of computations along with their status appears.


By clicking on a computation, the summary of the input parameters appears. In the case the computation has finished, also the output information and an inspection facility are provided.

Retrieve the Reults

When the computation has finished, you can access the results in different ways. When a file is produced (e.g. a trained model or an image), it can be saved on the workspace or downloaded on your local machine.

The file can be retrieved as follows if the computation has just finished:

  • click on the Computation Execution tab or on the Check the Computation button;
  • download results files clicking on "Download file";
  • save results in the workspace by clicking on "Save in the workspace";


If you want to retrieve the outputs and details of the past computations, you can click on the Check the Computations button (in the upper right corner of the page).


By clicking on one of the past executions, you can retrieve the details of the computations, the parameters you used and the outputs. You can visualize or save the outputs too.


Check the Produced and Imported Datasets

Once you have imported or produced a dataset, you can save it on the Workspace in CSV format or simply check its content. This can be achieved by clicking on the Access to the Data Space button.


The list of tables and files that have been imported or produced appears, which contains a summary according to the template or to the provenance (i.e. imported or produced dataset).


By clicking on the buttons to the right side, you can look at the dataset content (or download in the case of a file), delete the dataset or save the dataset on the Workspace in CSV format (in the case of tables). Once the file is on the Workspace it can be shared with the other VRE participants or saved to your local machine.


A video about invoking Statistical Manager algorithms:

A video about using the Statistical Manager Data Space: