Difference between revisions of "Codelist Manager"

From Gcube Wiki
Jump to: navigation, search
Line 3: Line 3:
 
|}
 
|}
  
A Library for performing import, harmonization and curation on codelists. The goal of this library is to simplify the management of codelists.  
+
A Library for performing import, harmonization and curation on codelists.  
 
This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.
 
This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.
  
 
== Overview ==
 
== Overview ==
  
The goal of this service is to offer a single entry for processing, assessing and harmonizing occurrence points belonging to species observations. Data can come from the Species Discovery Service or they could be uploaded from a user by means of a web interface.  
+
The goal of this library is to simplify the management of codelists. Data can come from csv (comma separated values) exported form a database or using the sdmx protocol (used by FAO, Eurostat etc.).  
 
+
The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.
+
  
 
== Design ==
 
== Design ==
Line 16: Line 14:
 
=== Philosophy ===
 
=== Philosophy ===
  
This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from  collected data coming from difference sources. This is meant as a complement of other services for species and occurrence points analysis.
+
This library represents a way to integrate codelist in the system and make it available for multiples use (TimeSeries, Aquamaps or other applications).
  
 
=== Architecture ===
 
=== Architecture ===
The subsystem comprises the following components:
 
 
* '''Inputs Managers''': a set of internal processors which manage the variety of inputs that could come from users or from other services;
 
 
* '''Occurrence Point Processors''': a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.;
 
 
* '''Occurrence Points Enrichment''': a connector to another d4Science service dealing with the enrichment of occurrence points with associated information about the chemical and physical characteristics of the sea or the earth;
 
 
* '''Occurrence Points Operations''': a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations.
 
 
* '''Processing Orchestrator''': an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.
 
 
A diagram of the relationships between these components is reported in the following figure:
 
 
[[Image:occpointsreco.png|frame|center|Occurrence Points Reconciliation Service, internal Architecture]]
 
  
 
== Deployment ==
 
== Deployment ==
All the components of the service must be deployed together in a single node. This subsystem can be replicated in multiple hosts and on multiple scopes, this does not guarantee a performance improvement because this is a management system for a single input dataset.
 
  
 
=== Small deployment ===
 
=== Small deployment ===
 
The deployment follows the following schema as it needs the presence of other complementary services.
 
 
[[Image:occpointsarchitecture.png|frame|center|Occurrence Points Reconciliation Service, Deployment schema]]
 
  
 
== Use Cases ==
 
== Use Cases ==
  
 
=== Well suited Use Cases ===
 
=== Well suited Use Cases ===
 
The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art algorithms are studied and developed for managing such kind of information.
 
  
 
== Subsystems ==
 
== Subsystems ==

Revision as of 15:49, 10 May 2012

A Library for performing import, harmonization and curation on codelists. This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.

Overview

The goal of this library is to simplify the management of codelists. Data can come from csv (comma separated values) exported form a database or using the sdmx protocol (used by FAO, Eurostat etc.).

Design

Philosophy

This library represents a way to integrate codelist in the system and make it available for multiples use (TimeSeries, Aquamaps or other applications).

Architecture

Deployment

Small deployment

Use Cases

Well suited Use Cases

Subsystems