How-to Implement Algorithms for the Statistical Manager

From Gcube Wiki
Revision as of 16:08, 6 November 2013 by Gianpaolo.coro (Talk | contribs) (Related Links)

Jump to: navigation, search

Prerequisites

IDE: Eclipse Java EE IDE for Web Developers. Version: 3.7+

Step by Step

Let's start by creating a project using the eclipse IDE that is mavenized according to our indications. After having mavenized the project in eclipse you have to put dependencies.

Maven coordinates

The maven artifact coordinates are:

<dependency>
	<groupId>org.gcube.dataanalysis</groupId>
	<artifactId>ecological-engine</artifactId>
	<version>1.6.1-SNAPSHOT</version>
</dependency>

Let's start creating a new call which implements a basic algorithm; it will be executed by the Statistical Manager. The next step is to extend a basic interface StandardLocalExternalAlgorithm. The following snippet shows unimplemented interface methods that we are going to fulfill.

public class SimpleAlgorithm extends StandardLocalExternalAlgorithm{
 
	@Override
	public void init() throws Exception {
		// TODO Auto-generated method stub		
	}
	@Override
	public String getDescription() {
		// TODO Auto-generated method stub
		return null;
	}
	@Override
	protected void process() throws Exception {
		// TODO Auto-generated method stub
 
	}
	@Override
	protected void setInputParameters() {
		// TODO Auto-generated method stub
 
	}
	@Override
	public void shutdown() {
		// TODO Auto-generated method stub		
	}
	@Override
	public StatisticalType getOutput() {
		return null;
	}
}

The init() is the initialization method. In this simple example we need to initialize the loging facility and we use the logger from the ecological engine library. In case the algorithm uses a database, we have to open its connection in this method. The shutdown() closes database connection. In the getDescription() method we add a simple description for the algorithm.

Customize input visualization

String input parameters

The user's input is obtained by calling from setInputParameters() the method addStringInput with following parameters:

  • name of the variable ;
  • description for the variable;
  • default value;

User input is retrieved using getInputParameter() passing name used as parameter into setInputParameters().

protected void setInputParameters() {
		addStringInput(NameOfVariable, "Description", "DefaultInput");
 
}

The input parameter will be automatically passed by Statistical Manager to the procedure. In particular, to process the method we can retrieve such parameter by name that we set in the addStringInput method.

@Override
protected void process() throws Exception {
....
String userInputValue = getInputParameter(NameOfVariable);
}

Combo box input parameter

In order to obtain a combo box we have to define a enumerator that contains the possible choices that could be selected in the combo box and you have to pass it to the method addEnumerateInput as follows:

public enum Enum {
FIRST_ENUM,
SECOND_ENUM
}
 
protected void setInputParameters() {
addEnumerateInput(Enum.values(), variableName, "Description",
					Enum.FIRST_ENUM.name());
}

addEnumerateInput parameters are rispectivly:

  • values of declared enumerator;
  • name of variable used to extract value insert by user;
  • description of value;
  • default value visualized in comboBox

Import input from Statistical Manager database

User can be upload his data in the Statistical Manager "Access to the Data Space" Section. After the uploading of a file (for example csv file), it's possible to use uploaded data as input for an algorithm. In order to select the columns values of a table that is extrapolated from csv, an algorithm developer fulfills the methods in the following way:

@Override
protected void setInputParameters() {
List<TableTemplates> templates = new ArrayList<TableTemplates>();
templates.add(TableTemplates.GENERIC);
InputTable tinput = new InputTable(templates, "Table","Table Description");
ColumnTypesList columns = new ColumnTypesList("Table","Columns", "Selceted Columns Description", false);
inputs.add(tinput);
inputs.add(columns);
DatabaseType.addDefaultDBPars(inputs);
 
}
 
@Override
protected void process() throws Exception {
{
config.setParam("DatabaseDriver", "org.postgresql.Driver");
SessionFactory dbconnection = DatabaseUtils.initDBSession(config);
String[] columnlist = columnnames.split(AlgorithmConfiguration.getListSeparator());
List<Object> speciesList = DatabaseFactory.executeSQLQuery("select " + columnlist[0]+ " from " + tablename, dbconnection);
}

Case of algorithms using databases

In order to use a database it is required to call, into setInputParameters(), the method addRemoteDatabaseInput(). An important step is to pass as first parameter the name of the Runtime Resource addressing the database. The Statistical Manager automatically retrieves thew following parameters from the runtime resource: url ,user and password. Into the process method, before database connection, url,user and password will be retrieve using getInputParameter. Each of them is retrieved using the name and passing it into addRemoteDatabaseInput as parameters.

@Override
protected void setInputParameters() {	
...	
addRemoteDatabaseInput("Obis2Repository", urlParameterName,userParameterName, passwordParameterName, "driver", "dialect");
 
@Override
protected void process() throws Exception {
...
 
String databaseJdbc = getInputParameter(urlParameterName);
String databaseUser = getInputParameter(userParameterName);
String databasePwd = getInputParameter(passwordParameterName);
 
connection = DriverManager.getConnection(databaseJdbc, databaseUser,databasePwd);
...
 
}

Customize output

The last step is to set and to specify output of procedure. For this purpose we override the method getOutput() which return StatisticalType. First output parameter we instantiate is a PrimitiveType object that wraps a string; so, we set type as string. We associate name and description to the output value. We can istantiate a second output as an another PrimitiveType We set them as a map which will keep the order of the parameter used to store both output. We add both the output object into the map.

getOutput() procedure which will invoke Statistical Manager to understand type of the output object and at this point in the ecological engine library the algorithm will be indexed with the name set in the file of property.

String Output

In ordert to have a string as output you have to create a PrimitiveType as follows:

@Override
public StatisticalType getOutput() {
….
PrimitiveType val = new PrimitiveType(String.class.getName(), myString , PrimitiveTypes.STRING, stringName, defaultValue);
return val;
 
}

Bar Chart Output

In order to create an Histogram Chart you have to fulfill a DafaultCategoryDataser object and use it to create chart

DefaultCategoryDataset dataset;
…
dataset.addValue(...);	
….
 
 
@Override
public StatisticalType getOutput() {
….
HashMap<String, Image> producedImages = new HashMap<String, Image>();
JFreeChart chart = HistogramGraph.createStaticChart(dataset);
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));
producedImages.put("Species Observations", image);}

Timeseries Chart Output

In order to create a TimeSeries Chart you have to fulfill a DafaultCategoryDataser object and use it to create the chart. The second parameter of createStatiChart method is the format of time.

DefaultCategoryDataset dataset;
…
dataset.addValue(...);	
….
@Override
public StatisticalType getOutput() {
...
HashMap<String, Image> producedImages = new HashMap<String, Image>();
JFreeChart chart = TimeSeriesGraph.createStaticChart(dataset, "yyyy");
Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));
producedImages.put("TimeSeries chart", image);
... 
}

File Output

In order to create a results file that user can download, algorithm developers have to add following code:

protected String fileName;
protected BufferedWriter out;
 
@Override
protected void process() throws Exception {
fileName = super.config.getPersistencePath() + "results.csv";
out = new BufferedWriter(new FileWriter(fileName));
out.write(results);
out.newLine();
}
 
@Override
public StatisticalType getOutput() {
...
PrimitiveType file = new PrimitiveType(File.class.getName(), new File(fileName), PrimitiveTypes.FILE, "Description ", "Default value");
map.put("Output",file);
...
}

Test the algorithm

This is a template example to test an algorithm from Eclipse. The same Factory exist for Clusterers, Evaluators, Modellers and Generators. Download the following folder http://goo.gl/yO5jui and put it locally to the code. For new algorithms just edit one among the Transducers, Clusterers, Evaluators, Modellers or Generators files, adding your class. Just edit only the file which is suited to the category of your agorithm.

package org.gcube.dataanalysis.ecoengine.test.regression;
 
import java.util.List;
 
import org.gcube.dataanalysis.ecoengine.configuration.AlgorithmConfiguration;
import org.gcube.dataanalysis.ecoengine.evaluation.bioclimate.InterpolateTables.INTERPOLATIONFUNCTIONS;
import org.gcube.dataanalysis.ecoengine.interfaces.ComputationalAgent;
import org.gcube.dataanalysis.ecoengine.interfaces.Transducerer;
import org.gcube.dataanalysis.ecoengine.processing.factories.TransducerersFactory;
 
public class TestTransducers {
 
public static void main(String[] args) throws Exception {
System.out.println("TEST 1");
List<ComputationalAgent> trans = null;
trans = TransducerersFactory.getTransducerers(testConfigLocal());
trans.get(0).init();
Regressor.process(trans.get(0));
trans = null;
}
 
private static AlgorithmConfiguration testConfigLocal() {
 
 AlgorithmConfiguration config = Regressor.getConfig();
 config.setAgent("OCCURRENCES_DUPLICATES_DELETER");
 
 config.setParam("longitudeColumn", "decimallongitude");
 config.setParam("latitudeColumn", "decimallatitude");
 config.setParam("recordedByColumn", "recordedby");
 config.setParam("scientificNameColumn", "scientificname");
 config.setParam("eventDateColumn", "eventdate");
 config.setParam("lastModificationColumn", "modified");
 config.setParam("OccurrencePointsTableName", "whitesharkoccurrences2");
 config.setParam("finalTableName", "whitesharkoccurrencesnoduplicates");
 config.setParam("spatialTolerance", "0.5");
 config.setParam("confidence", "80");
 
return config;
}
 
}

Properties File and Deploy

In order to deploy an algorithm we must create:

  • the jar corresponding to the eclipse Java project containing the algorithm;
  • a file of property containing the name you want the algorithm to be displayed on the GUI and the classpath to algorithm class. E.g. MY_ALGORITHM=org.gcube.cnr.Myalgorithm

You must provide these two files to the i-Marine team. They will move the algorithm onto a Statistical Manager instance and the interface will be automatically generated.

In the following example, inside the src/main/java folder, the package org.gcube.dataanalysis.myAlgorithms exists that contains the class SimpleAlgorithm implementing an algorithm. SIMPLE_ALGORITHM=org.gcube.dataanalysis.myrAlgorithms.SimpleAlgorithm

Complete Example with multiple outputs

public class AbsoluteSpeciesBarChartsAlgorithm  extends
StandardLocalExternalAlgorithm  {
	LinkedHashMap<String, StatisticalType> map = new LinkedHashMap<String, StatisticalType>();
	static String databaseName = "DatabaseName";
	static String userParameterName = "DatabaseUserName";
	static String passwordParameterName = "DatabasePassword";
	static String urlParameterName = "DatabaseURL";
	private String firstSpeciesNumber="Species# :";
	private String yearStart="Starting year :";
	private String yearEnd="Ending year :";
	private int speciesNumber;
	private DefaultCategoryDataset defaultcategorydataset;
	@Override
	public void init() throws Exception {
		AnalysisLogger.getLogger().debug("Initialization");		
	}
 
	@Override
	public String getDescription() {
		return "Algorithm returning bar chart of most observed species in a specific years range (with respect to the OBIS database)";
	}
 
	@Override
	protected void process() throws Exception {
		defaultcategorydataset = new DefaultCategoryDataset();
		String driverName = "org.postgresql.Driver";
		String tmp=getInputParameter(firstSpeciesNumber);
 
		speciesNumber = Integer.parseInt(tmp);
		Class driverClass = Class.forName(driverName);
		Driver driver = (Driver) driverClass.newInstance();
		String databaseJdbc = getInputParameter(urlParameterName);
		String year_start = getInputParameter(yearStart);
		String year_end = getInputParameter(yearEnd);
 
		String databaseUser = getInputParameter(userParameterName);
		String databasePwd = getInputParameter(passwordParameterName);
		Connection connection = null;
		connection = DriverManager.getConnection(databaseJdbc, databaseUser,
				databasePwd);
		Statement stmt = connection.createStatement();
		String query = "SELECT  tname, sum(count)AS count FROM public.count_species_per_year WHERE year::integer >="
				+ year_start
				+ "AND year::integer <="
				+ year_end
				+ "GROUP BY tname ORDER BY count desc;";
		ResultSet rs = stmt.executeQuery(query);
		int i =0;
		String s = "Species";
			while (rs.next()&& i<speciesNumber) {
 
				String tname = rs.getString("tname");
				String count = rs.getString("count");
				int countOcc=Integer.parseInt(count);
 
                                // First output (list of string)
				PrimitiveType val = new PrimitiveType(String.class.getName(), count, PrimitiveTypes.STRING, tname, tname);
				map.put(tname, val);	
				if(i<16)
				defaultcategorydataset.addValue(countOcc,s,tname);	
				else
					break;
				i++;
 
		}
		connection.close();
 
 
 
	}
 
	@Override
	protected void setInputParameters() {
		addStringInput(firstSpeciesNumber,
				"Number of shown species", "10");
		addStringInput(yearStart, "Starting year of observations",
				"1800");
		addStringInput(yearEnd, "Ending year of observations", "2020");
		addRemoteDatabaseInput("Obis2Repository", urlParameterName,
				userParameterName, passwordParameterName, "driver", "dialect");
 
 
	}
 
	@Override
	public void shutdown() {
		AnalysisLogger.getLogger().debug("Shutdown");		
	}
 
 
	@Override
	public StatisticalType getOutput() {
		PrimitiveType p = new PrimitiveType(Map.class.getName(), PrimitiveType.stringMap2StatisticalMap(outputParameters), PrimitiveTypes.MAP, "Discrepancy Analysis","");
		AnalysisLogger.getLogger().debug("MapsComparator: Producing Gaussian Distribution for the errors");	
		//build image:
		HashMap<String, Image> producedImages = new HashMap<String, Image>();
 
		JFreeChart chart = HistogramGraph.createStaticChart(defaultcategorydataset);
	     Image image = ImageTools.toImage(chart.createBufferedImage(680, 420));
	     producedImages.put("Species Observations", image);
 
		PrimitiveType images = new PrimitiveType(HashMap.class.getName(), producedImages, PrimitiveTypes.IMAGES, "ErrorRepresentation", "Graphical representation of the error spread");
 
		//end build image
		AnalysisLogger.getLogger().debug("Bar Charts Species Occurrences Produced");
		//collect all the outputs
 
		map.put("Result", p);
		map.put("Images", images);
 
		//generate a primitive type for the collection
		PrimitiveType output = new PrimitiveType(HashMap.class.getName(), map, PrimitiveTypes.MAP, "ResultsMap", "Results Map");
 
 
		return output;
	}
 
}

Related Links

Statistical Manager Tutorial

How to Interact with the Statistical Manager by means of a thin client