Occurrence Data Enrichment Service

From Gcube Wiki
Revision as of 14:14, 11 May 2012 by Gianpaolo.coro (Talk | contribs) (Small deployment)

Jump to: navigation, search

A service for performing enrichment of the information associated to occurrence points of species. The aim is to provide users with an interface for searching among the available environmental information that can be attached to the occurrence points under analysis. This document outlines the design rationale, key features, and high-level architecture, as well as the options deployment.

Overview

The goal of this service is to offer a single entry for enriching information associated to the coordinates corresponding to some occurrence points set. Data can come from the Species Discovery Service, from the Occurrence Data Reconciliation Service or they could be uploaded from a user by means of a web interface.

The service is able to interface to other infrastructural services in order to expand the number of functionalities and applications to the data under analysis.

The environmental information will be supplied by the Environmental Service of d4Science along with the list of the available information resident in the infrastructure.

Design

Philosophy

This represents an endpoint for users who want to add some environmental information to coordinates for occurrence points. It is meant as a complement of other services for species and occurrence points analysis.

Architecture

The subsystem comprises the following components:

  • Inputs Managers: a set of internal processors which manage the variety of inputs that could come from users or from other services. Data can come from the Occurrence Data Reconciliation Service;
  • Occurrence Points Sets Operations: a set of internal objects which can invoke external systems in order to process data sets. Merge, Subtraction and Intersection operations can be invoked by interfacing to the Statistical Manager;
  • Occurrence Points Enrichment: a connector to the Environmental Service for (i) retrieving discoverable information (ii) retrieving environmental data yet present in d4Science (iii) produce data by interpolation or kriging if necessary;
  • Processing Orchestrator: an internal process which manages the interaction and the usage of the other components. It accepts and dispatches requests coming from outside the service.

A diagram of the relationships between these components is reported in the following figure:

Occurrence Points Enrichment Service, internal architecture

Deployment

All the components of the service must be deployed together in a single node. This subsystem can be replicated in multiple hosts and on multiple scopes, this does not guarantee a performance improvement because this is a management system for a single input dataset.

Small deployment

The deployment follows the following schema as it needs the presence of other complementary services.

Occurrence Points Enrichment Service, deployment schema

Use Cases

Well suited Use Cases

The subsystem is particularly suited when users want to investigate marine properties where species live. This helps in understanding the characteristics of the sea they prefer. The advantage to have environmental information discovered by en external service (the Environmental Service) can boost the investigation of species habitat, which normally requires a big amount of time to scholars.

Subsystems

Occurrence Data Enrichment Service depends on the following subsystems, where each specializes along the structure or the semantics of the data: