Data Consumption Software Consolidated Specifications

From Gcube Wiki
Revision as of 17:48, 17 February 2014 by John.gerbesiotis (Talk | contribs) (Specifications)

Jump to: navigation, search

Overview

This page contains an overview about the components and facilities provided by the gCube Data Consumption Software, along with links to the software specifications and to the Developers' guides. The main aim is to provide a summary of the supported software at different granularities. The facilities regard the gCube components that deal with several aspects of Data Consumption, in particular: Retrieval, Manipulation, Mining, Visualisation and Semantic Data Analysis.

Key Features

The gCube Data Consumption facilities provide the following key features:

Data Retrieval

Declarative Query Language over a heterogeneous environment
gCube Data Retrieval framework unifies Data Sources that use different data representation and semantics through the CQL standard.
On the fly Integration of Data Sources
A Data Source that publishes its Information Retrieval capabilities can be on-the-fly involved in the IR process.
Scalability in the number of Data Sources
Planning and Optimization mechanisms detect the minimum number of Sources needed to be involved during query answering, along with an optimal plan for execution.
Direct Integration of External Information Providers
Through the OpenSearch standard, external Information Providers can be queried dynamically. The results they provide can be aggregated with the rest of results during query answering.
Indexing Capabilities for Replication and High Availability
Multidimensional and Full-text indexing capabilities using an architecture that efficiently supports replication and high availability.
Distributed Execution Environment offering High Performance and Flexibility
Efficient execution of search plans over a large heterogeneous environment.

Data Manipulation

Automatic transformation path identification
Given the content type of a source object and the target content type, framework finds out the appropriate transformation to use. In addition, there is the ability to dynamically form a path of a number of transformation steps to produce the final format. Shortest path length is favorable.
Fine-grained sub typing of formats
Providing an extensive freedom for supported types and for the parameters of them (e.g. resolution, fps etc).
Pluggable algorithms for content transformation
A generic transformation framework that is based on pluggable components termed transformation programs. Transformation programs reveal the transformation capabilities of the framework. With this approach we are able to furnish domain and application specific data transformations.
Exploitation of PE2ng Infrastructure
The integration with the PE2ng engine allows to have access to vast amounts of processing power and enables to handle virtually any transformation task thus consisting the standard Data Manipulation facility for gCube applications.

Data Mining

Parallel processing
Parallelization of statistical algorithms using a map-reduce approach
Cloud computing approach in a seamless way to the users
Pre-cooked state-of-the-art data mining algorithms
Algorithms oriented to biological-related problems supplied as-a-service
General purpose algorithms (e.g. Clustering, Principal Component Analysis, Artificial Neural Networks) supplied as-a-service
Data trends generation and analysis
Extraction of trends for biodiversity data
Inspection of time series of observations on biological species
Basic signal processing techniques to explore periodicities in trends
Ecological niche modelling
Algorithms to perform ecological niche modelling using either mechanistic or correlative approaches
Species distribution maps generation

Data Visualisation

Uniform access over geospatial GIS layers
Investigation over layers indexed by GeoNetwork;
Visualization of distributed layers;
Add of remote layers published in standard OGC formats (WMS or WFS);
Filtering and analysis capabilities
Possibility to perform CQL filters on layers;
Possibility to trace transect charts;
Possibility to select areas for investigating on environmental features;
Search and indexing capabilities
Possibility to sort over titles on a huge quantity of layers;
Possibility to search over titles and names on a huge quantity of layers;
Possibility to index layers by invoking GeoNetwork functionalities;

Semantic Data Analysis

Provision of results clustering over any search system
Returns textual snippets and for which there is an OpenSearch description.
Provision of snippet or contents-based entity recognition
Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints.
Provision of gradual faceted (session-based) search
Allows to gradually restrict the answer based on the selected entities and/or clusters.
Ability to fetch and display semantic information of an identified entity
Achieved by querying approprate SPARQL endpoints.
Ability to apply these services on any web page through a web browser
Using the functionality of bookmarklets.

Components

Data Retrieval

Search Planning and Execution Specification
which enables the integration of CQL-compliant Data Sources and are responsible for answering queries by combining Data Sources capabilities and Search Operators
Data Sources Specification
which aims to provide integration of the data from different data providers into our infrastructure

Data Manipulation

Data Transformation Service Specification
which transforms content and metadata among different formats and specifications

Data Mining

Statistical Manager
a Service allowing the management of statistical data and multi-user requests for computation
Ecological Modeling
a set of methods for performing Data Mining operations. These include experiments and techniques categorization
Signal_Processing
a set of methods to perform digital signal processing

Data Visualisation

Gis Viewer
a tool for visual analysis of geospatial layers stored on a GeoServer or remotely published by WFS or WMS protocols
Geo Explorer
a tool for search and browse geo-spatial data sets spread in a number of data providers linked to the infrastructure
Geospatial_Data_Processing#TIFFUploader_Algorithm
a tool for transforming geo-spatial data from a format into another accepted by common GIS visualisers

Semantic Data Analysis

X-Search
a meta-search engine that reads the description of an underlying search source, and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories

Specifications

The specifications require preparatory information to be properly understood. In particular:

How to Develop a gCube Component
A basic guide to build a gCube Component
Buiding Components using the gCube Fetherweight Stack
A guide to develop libraries or clients for the gCube Services
Developer's Guide
the overall gCube Developer's Guide

Task oriented specifications can be found in the following:

Data Retrieval Specifications
the specifications for the Data Retrieval components
Data Manipulation Facilities
the specifications for the Data Manipulation components
Data Mining Specifications
the specifications for the Data Mining components
Data Visualisation Specifications
the specifications for the Data Visualisation components
Semantic Data Analysis
the specifications for the Semantic Data Analysis components