Difference between revisions of "Codelist Manager"

From Gcube Wiki
Jump to: navigation, search
m
 
(12 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
|}
 
|}
  
A Library for performing import, harmonization and curation on codelists. The goal of this library is to simplify the management of codelists.  
+
A component supporting the entire life-cycle of code lists management including creation (via ingestion if they exists), curation and publishing.  
 +
Code lists are also known as controlled vocabularies or coded value enumerations.
 +
They are particularly important at (a) data dissemination and exchange layer among different organisations and (b) at data reporting and production, as to ease the mutual understanding between the originator and the consumer.
 +
They represents a key concept in the statistical data management.  
 +
 
 
This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.
 
This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.
  
 
== Overview ==
 
== Overview ==
  
The goal of this service is to offer a single entry for processing, assessing and harmonizing occurrence points belonging to species observations. Data can come from the Species Discovery Service or they could be uploaded from a user by means of a web interface.  
+
The goal of this component is to simplify the management of code lists.
 +
In particular, it is mainly conceived to support the entire life-cycle of code list from their creation to their exploitation.  
  
The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.
+
The component allows to create a code list by acquiring the data forming it via a file storing them in a csv (comma separated values) format or via the [http://sdmx.org/ SDMX protocol] (a reference standard for Statistical data).
 +
 
 +
<!-- === Key features ===
 +
 
 +
<font color=red>TO BE COMPLETE</font> -->
  
 
== Design ==
 
== Design ==
Line 16: Line 25:
 
=== Philosophy ===
 
=== Philosophy ===
  
This represents an endpoint for users who want to process species observation in order to explore their coherence and to extract some hidden properties from  collected data coming from difference sources. This is meant as a complement of other services for species and occurrence points analysis.
+
The goal of this component is to offer code list management facilities by abstracting on various aspects including the storage model and the data distribution across diverse storage units.
  
=== Architecture ===
+
Moreover, another distinguishing aspect is the automatic recording and injection of additional information characterising the code list evolution for provenance and versioning purposes.
The subsystem comprises the following components:
+
  
* '''Inputs Managers''': a set of internal processors which manage the variety of inputs that could come from users or from other services;
+
<!-- === Architecture ===
  
* '''Occurrence Point Processors''': a set of internal objects which can invoke external systems in order to process data or extract hidden properties from them. These include Clustering, Anomaly Points Detection etc.;
+
<font color=red>Only the library?</font> -->
  
* '''Occurrence Points Enrichment''': a connector to another d4Science service dealing with the enrichment of occurrence points with associated information about the chemical and physical characteristics of the sea or the earth;
+
== Deployment ==
  
* '''Occurrence Points Operations''': a connector to another d4Science interface which is able to operate on tabular data, by performing visualization, aggregation and transformations.
+
This library must be co-deployed with the hosting service.
 
+
* '''Processing Orchestrator''': an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.
+
 
+
A diagram of the relationships between these components is reported in the following figure:
+
 
+
[[Image:occpointsreco.png|frame|center|Occurrence Points Reconciliation Service, internal Architecture]]
+
 
+
== Deployment ==
+
All the components of the service must be deployed together in a single node. This subsystem can be replicated in multiple hosts and on multiple scopes, this does not guarantee a performance improvement because this is a management system for a single input dataset.
+
  
 
=== Small deployment ===
 
=== Small deployment ===
  
The deployment follows the following schema as it needs the presence of other complementary services.
+
[[Image:codelistmanager.png|frame|center|Small Deployment]]
 
+
[[Image:occpointsarchitecture.png|frame|center|Occurrence Points Reconciliation Service, Deployment schema]]
+
  
 
== Use Cases ==
 
== Use Cases ==
Line 48: Line 45:
 
=== Well suited Use Cases ===
 
=== Well suited Use Cases ===
  
The subsystem is particularly suited when experiment have to be performed on occurrence points referring to a certain species or family. The set of operations which can be applied, even lying on state-of-the-art algorithms are studied and developed for managing such kind of information.
+
This library is particularly suited for the use in the services that need codelists.
 
+
== Subsystems ==
+

Latest revision as of 18:47, 5 November 2012

A component supporting the entire life-cycle of code lists management including creation (via ingestion if they exists), curation and publishing. Code lists are also known as controlled vocabularies or coded value enumerations. They are particularly important at (a) data dissemination and exchange layer among different organisations and (b) at data reporting and production, as to ease the mutual understanding between the originator and the consumer. They represents a key concept in the statistical data management.

This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.

Overview

The goal of this component is to simplify the management of code lists. In particular, it is mainly conceived to support the entire life-cycle of code list from their creation to their exploitation.

The component allows to create a code list by acquiring the data forming it via a file storing them in a csv (comma separated values) format or via the SDMX protocol (a reference standard for Statistical data).


Design

Philosophy

The goal of this component is to offer code list management facilities by abstracting on various aspects including the storage model and the data distribution across diverse storage units.

Moreover, another distinguishing aspect is the automatic recording and injection of additional information characterising the code list evolution for provenance and versioning purposes.


Deployment

This library must be co-deployed with the hosting service.

Small deployment

Small Deployment

Use Cases

Well suited Use Cases

This library is particularly suited for the use in the services that need codelists.