Data Transfer Facilities

From Gcube Wiki
Jump to: navigation, search

Overview

The implementation of a reliable data transfer mechanisms between the nodes of a gCube-based Hybrid Data Infrastructure is one of the main objectives when dealing with large set of multi-type datasets distributed across different repositories.

To promote an efficient and optimized consumption of these data resources, a number of components have been designed to meet the data transfer requirements.

This document outlines the design rationale and high-level architecture of such components.

Key Features

The components part of the subsystem provide the following main key features:

Point to Point transfer
one writer-one reader as core functionality
Produce only what is requested
a producer-consumer model that blocks when needed and reduces the unnecessary data transfers
Intuitive stream and iterator based interface
simplified usage with reasonable default behavior for common use cases and a variety of features for increased usability and flexibility
Multiple protocols support
data transfer currently supports the following protocols: tcp and http
HTTP Broker Servlet
transfer results are exposed as an http endpoint
Reliable data transfer between Infrastructure Data Sources and Data Storages
by exploiting the uniform access interfaces provided by gCube
Structured and unstructured Data Transfer
both Tree based and File based transfer to cover all possible use-cases
Transfers to local nodes for data staging
data staging for particular use cases can be enabled on each node of the infrastructure
Advanced transfer scheduling and transfer optimization
a dedicated gCube service responsible for data transfer scheduling and transfer optimization
Transfer statistics availability
transfers are logged by the system and made available to interested consumers.

Main Components

the Result Set components
this family of components provide a common data transfer mechanism that aims to establish high throughput point to point on demand communication. It has been designed as a core functionality of the overall system and it can be considered as well the building block for the Data Transfer Scheduler & Agent components.

;the Data Transfer Scheduler & Agent components

this family of components guarantees VO/VRE Administrators the possibility to transfer data among Data Sources and Data Storages. It can be exploited as well by any client or web services to implements data movement between infrastructure nodes.
the Data Transfer 2
this family of components ...