Difference between revisions of "Data Access and Storage Facilities"
Line 1: | Line 1: | ||
+ | Accessing data sources for retrieval or storage purposes is a fundamental requirement for a wide range of system processes, including indexing, transfer, transformation, and presentation. Equally, it is a main driver for clients that interface the resources managed by the system. | ||
+ | |||
+ | A large number of system components are dedicated to meet data access requirements, including services, service plugins, client-side libraries, server-side libraries, and front-end interfaces. | ||
+ | |||
+ | This document outlines the rationale and high-level architecture of such components. | ||
+ | |||
== Overview == | == Overview == | ||
− | + | ||
+ | Collectively, data access components provide three key facilities: | ||
+ | * the ability to store data in resources managed by the system; | ||
+ | * the ability to access data that is stored in resources managed by the system; | ||
+ | * the ability to access data that is stored in resources managed externally to the system; | ||
+ | |||
+ | The facilities are provided over data with heterogeneous structure, size, and semantics: | ||
+ | * from unstructured data to structured data and semi-structured data; | ||
+ | * from small data sets to large and very large data sets; | ||
+ | * from document data, to statistical, biodiversity, and semantic data; | ||
+ | |||
+ | and in compliance with the following non-functional requirements: | ||
+ | |||
+ | * the requirement of secure access; | ||
+ | * the requirement of scalable and efficient access; | ||
+ | * the requirement of standards-based access; | ||
+ | |||
+ | In summary, the data access components provide ''secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics''. | ||
== Key Features == | == Key Features == | ||
− | |||
− | |||
− | |||
− | ; | + | ;uniform model and access API over structured data |
− | : | + | :dynamically pluggable architecture of transformations to and from internal and external data sources; |
+ | :standards-based plugins for document, biodiversity, statistical, and semantic data sources; | ||
+ | |||
+ | ;fine-grained access to structured data | ||
+ | :horizontal and vertical filtering based on pattern matching; | ||
+ | :URI-based resolution; | ||
+ | :in-place remote updates; | ||
+ | |||
+ | ;scalable access to structured data | ||
+ | :autonomic service replication with infrastructure-wide load balancing; | ||
+ | |||
+ | ;efficient and scalable storage of structured data | ||
+ | :based on graph database technology; | ||
+ | |||
+ | ;rich tooling for client and plugin development | ||
+ | :high-level Java APIs for service access; | ||
+ | :DSLs for pattern construction and stream manipulations; | ||
− | ; | + | ;TODO features for biodiversity data: |
− | : | + | :''Lucio please add with respect Species Service'' |
− | ; | + | ;TODO features for file access and storage |
− | : | + | :''Roberto please add with respect to File Storage API'' |
== Subsystems == | == Subsystems == | ||
− | + | Data access components cluster within the following subsystems: | |
− | + | ||
− | + | ||
− | + | ||
− | + | ;the [[Tree-Based Data Access]] subsystem | |
+ | :groups components that implement access and storage facilities for structured data of arbitrary semantics, origin, and size, based on a uniform API of CRUD operations and a uniform data model of labelled trees; | ||
− | [[ | + | ;the [[Biodiversity Data Access]] subsystem |
+ | :groups components that implement access and storage facilities for structured data with biodiversity semantics and arbitrary origin and size; | ||
− | + | ;the [[File-Based Access]] subsystem: | |
+ | :groups components that implement access and storage facilities for unstructured data with arbitrary semantics and size; |
Revision as of 14:48, 22 February 2012
Accessing data sources for retrieval or storage purposes is a fundamental requirement for a wide range of system processes, including indexing, transfer, transformation, and presentation. Equally, it is a main driver for clients that interface the resources managed by the system.
A large number of system components are dedicated to meet data access requirements, including services, service plugins, client-side libraries, server-side libraries, and front-end interfaces.
This document outlines the rationale and high-level architecture of such components.
Overview
Collectively, data access components provide three key facilities:
- the ability to store data in resources managed by the system;
- the ability to access data that is stored in resources managed by the system;
- the ability to access data that is stored in resources managed externally to the system;
The facilities are provided over data with heterogeneous structure, size, and semantics:
- from unstructured data to structured data and semi-structured data;
- from small data sets to large and very large data sets;
- from document data, to statistical, biodiversity, and semantic data;
and in compliance with the following non-functional requirements:
- the requirement of secure access;
- the requirement of scalable and efficient access;
- the requirement of standards-based access;
In summary, the data access components provide secure, scalable, efficient, standards-based storage and retrieval of data, where the data may be maintained by the system or outside the system and may vary in structure, size, and semantics.
Key Features
- uniform model and access API over structured data
- dynamically pluggable architecture of transformations to and from internal and external data sources;
- standards-based plugins for document, biodiversity, statistical, and semantic data sources;
- fine-grained access to structured data
- horizontal and vertical filtering based on pattern matching;
- URI-based resolution;
- in-place remote updates;
- scalable access to structured data
- autonomic service replication with infrastructure-wide load balancing;
- efficient and scalable storage of structured data
- based on graph database technology;
- rich tooling for client and plugin development
- high-level Java APIs for service access;
- DSLs for pattern construction and stream manipulations;
- TODO features for biodiversity data
- Lucio please add with respect Species Service
- TODO features for file access and storage
- Roberto please add with respect to File Storage API
Subsystems
Data access components cluster within the following subsystems:
- the Tree-Based Data Access subsystem
- groups components that implement access and storage facilities for structured data of arbitrary semantics, origin, and size, based on a uniform API of CRUD operations and a uniform data model of labelled trees;
- the Biodiversity Data Access subsystem
- groups components that implement access and storage facilities for structured data with biodiversity semantics and arbitrary origin and size;
- the File-Based Access subsystem
- groups components that implement access and storage facilities for unstructured data with arbitrary semantics and size;