How to use Data Transfer 2

From Gcube Wiki
Jump to: navigation, search

Data Transfer 2 is one of the subsystems forming the gCube Data_Transfer_Facilities. It aims to provide gCube Applications a common layer for efficient and transparent data transfer towards gCube SmartGear nodes. It's designed as a client service architecture exploiting plugin design pattern. A generic overview and its design are described here

Following sections describe how to use and interact with the involved components.

Data Transfer Service

The Data Transfer Service is a SmartGears-aware web application developped on top of [jersey] framework. Its main functionalities are :

  • receive and serve data transfer requests;
  • expose capabilities;

At startup it gathers information on :

  • current network configuration (i.e. exposed hostname, available ports) in order to negotiate transfer channel with clients;
  • available data-transfer plugins


The data transfer service is released as a war with the following maven coordinates


It needs to be hosted in a SmartGears installation in order to run. Please refer to SmartGears for further information.


In this section we will describe the http interfaces exposed by the service.


The Capabilities interface exposes information regarding :

  • Instance details (i.e. hostname, port, nodeId)
  • Available plugins
  • Available persistence Ids

Capabilities are mapped in a Java Object of class

Transfer requests

The Requests interface receives transfer requests from clients, returning the associated ticket ID if the requests has been successfully registered. E request is expected to specify :

  • The transfer settings decided by the caller client (including the data source);
  • The transfer destination (see #Transfer destination);
  • An optional set of plugin invocations (see #Plugin invocation)

Transfer requests are mapped in Java Objects of class

Transfer status

The Status interface provides information on the progress of the transfer identified by its related ticket ID. The transfer status provides information about :

  • The related transfer request;
  • Transfer statistics (i.e. transferredBytes, elapsed Time);
  • Destination file absolute location;
  • Overall status;
  • Error Message if any;

Data Transfer library

The data transfer library is a java library which serves applications as a client to data transfer facilities. In order to use the library, applications must declare the following dependency in their maven pom files :


The library is designed in order to offer a simple api to submit transfers to the selected services without dealing with :

  • http calls;
  • status monitoring;
  • transfer channel selection negotiation according to server's capabilities;

Submit a transfer

In order to submit a transfer to a chosen server, the application needs to get an instance of the class Instances of the client are obtained by calling on of the following static methods :

public static DataTransferClient getInstanceByEndpoint(String endpoint) throws UnreachableNodeException, ServiceNotFoundException;
public static DataTransferClient getInstanceByNodeId(String id) throws HostingNodeNotFoundException, UnreachableNodeException, ServiceNotFoundException;

To perform a transfer operation, application just need to invoke one of the exposed methods providing :

  • a transfer source (i.e. a object or its absolute path);
  • a transfer destination a.k.a file destination name for the basic scenario (see #Transfer destination for more in-depth details);
  • optional set of Plugin invocations (see #Plugin invovation for more in-depth details).

Please note the library exposes different signature of the same logic in order to mask unwanted functionalities to clients i.e. the following three calls perform the same operation :

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";

Using object (see #Transfer destination for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);

Using object (see #Plugin invovation for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> emptySet());

Transfer destination

For each transfer operation, clients are required to declare a destination definition using objects of the class Destination definitions include the following parameters :

  • destination file name (String)
the name that will be used for the transferred file in the remote service file system;
  • onExistingFileName ( [default value = ADD_SUFFIX]
declares the policy to follow in case the specified destination file name already exists in the declared location(see #Destination Clash Policies for further information);
  • persistence id (String) [default value = Destination.DEFAULT_PERSISTENCE_ID]
the persistence folder on the service runtime environment, identified by the target's application context name (see SmartGears for further information). Clients can use service capabilities in order to gather information on available context ids (See #Capabilities for further information). To use the default value (which identifies the data-transfer-service itself), clients can use the static member Destination.DEFAULT_PERSISTENCE_ID;
  • subFolder (String) [default value = null]
declare a destination sub-path starting from selected persistence folder;
  • createSubFolders (Boolean) [default value=false]
tells the service if it must consider or not the subFolder option;
  • onExistingSubFolder [default value = APPEND]
declares the policy to follow in case the specified destination subFolder already exists in the declared persistence folder (see #Destination Clash Policies for further information);

Destination Clash Policies

The enum class represents the available policies in case of file system clashes on server-side. Following is the set of supported clash policies and a brief description :

abort the transfer;
overwrite destination by previously deleting the existent one;
adds a bracket-isolated counter at the end of the clashing name (i.e. myFileName becomes myFileName(1));
adds the transferred content to the existing one.

Plugin invocation

Plugin invocations are declared by using instances of the class

These objects are formed by the following members :

  • pluginId (String)
the id of the installed plugin. Available plugins are listed in the server capabilities (see#Capabilities for more information);
  • parameters (Map<String,String>)
map of parameter-name -> parameter-value to be used in plugin invocations. Please use the static member PluginInvocation.DESTINATION_FILE_PATH as parameter value, for those parameters that need the actual destination's absolute path;

REST Invocations

From gCube 4.9.0 the <TransferMethod> option has been removed from the PATH and will be handled as the query parameter "method" (default value "FileUpload")

The service offers a REST interface for simple transfer requests / handling in the following format :


The following query parameters can be specified :

  • destination-file-name
  • create-dirs [Default : false]
  • on-existing-file [Default : ADD_SUFFIX]
  • on-existing-dir [Default : APPEND]

The following FORM DATA parameters can also be used :

  • uploadedFile : the file uploaded by the client
  • plugin-invocations : JSON representation of plugin invocation set

THREDDS upload and metadata publication via cURL

The following cURL command has the following behaviour :

  • 1. Uploads the file to "thredds" destination, subfolder "public/netcdf/myCatalog"
  • 2. Invokes plugin "SIS/GEOTK"

curl -F "uploadedFile=@/home/fabio/" --header "gcube-token:<GCUBE-TOKEN>" --form "plugin-invocations="SIS/GEOTK""

Data Transfer Plugins

This section aims to describe implemented plugins in order to help developers exploit their functionalities. Plugins are modules that are optionally invoked after the transfer is complete. Plugin invocations are declared within the Transfer request, specifying a set of [PluginInvocation] instances. Following sections list respectively :

#General Purpose Plugins; which are available on every SmartGears node.
#Specific Plugins; meant to address a particular installation.

General Purpose Plugins

This section describes general purposes plugin, which are included in default distributions. This means that these plugins are always available on a SmartGears node.

Decompress Archive Plugin

The 'Decompress Archive' plugin extracts the content of an archive to a specified path. The implementing module (needed at service side) is

Invocation details

Parameters List :

  • "DESTINATION" : [String value] The folder destination of uncompressed content expressed as a path relative to SOURCE_ARCHIVE. Default is same directory of SOURCE_ARCHIVE;
  • "OVERWITE_DESTINATION" : [Boolean value] Set true in order to overwrite DESTINATION content. Default is false;
  • "DELETE_ARCHIVE" : [Boolean value] Set true in order to delete SOURCE_ARCHIVE after extracting content. Default is false;
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Map<String,String> params=new HashMap<>();
params.put("DESTINATION", "myFolder");
params.put("SOURCE_ARCHIVE", PluginInvocation.DESTINATION_FILE_PATH);
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> singleton(new PluginInvocation("DECOMPRESS",params)));

Specific Plugins

This section lists plugins modules designed to address a particular installation (typically the management of third party applications). They will be available only on certain installation nodes, depending on needs.

Thredds Plugin Suite

Thredds plugin suite contains a set of plugins aimed to manage a Thredds installation in a gCube infrastructure. The implementing module (needed at service side) is


Following sections describe plugins exposed by this module.


Each of the following plugins expose an info object of class ''. Following is a serialized example of this object :

  1. {
  2.   "hostname": "",
  3.   "localBasePath": "/data/content/thredds",
  4.   "instanceBaseUrl": "",
  5.   "catalog": {
  6.     "ID": null,
  7.     "catalogFile": "catalog.xml",
  8.     "title": null,
  9.     "name": null,
  10.     "declaredDataSetRoot": null,
  11.     "declaredDataSetScan": [
  12.       {
  13.         "name": "Thredds Root Catalog",
  14.         "path": "public/netcdf",
  15.         "location": "/data/content/thredds/public/netcdf/",
  16.         "ID": "Root-DatasetScan"
  17.       }
  18.     ],
  19.     "subCatalogs": {
  20.       "name": "Catalogs of Virtual Research Environments VRE",
  21.       "ID": "VRE_Catalogs",
  22.       "linkedCatalogs": [
  23.         {
  24.           "ID": "preprodVRECatalog",
  25.           "catalogFile": "preprodVRECatalog.xml",
  26.           "title": "preprodVRECatalog",
  27.           "name": "preprodVRECatalog",
  28.           "declaredDataSetRoot": {
  29.             "path": "preVRE_static",
  30.             "location": "/data/content/thredds/preVRE",
  31.             "count": 0
  32.           },
  33.           "declaredDataSetScan": [
  34.             {
  35.               "name": "preprodVRECatalog Catalog",
  36.               "path": "preVRE_dynamic",
  37.               "location": "/data/content/thredds/preVRE",
  38.               "ID": "preprodVRECatalog_in_preVRE"
  39.             }
  40.           ],
  41.           "subCatalogs": null
  42.         }
  43.       ]
  44.     }
  45.   },
  46.   "adminUser": ...,
  47.   "adminPassword": ...,
  48.   "version": 4,
  49.   "minor": 6,
  50.   "build": 0,
  51.   "revision": 9,
  52.   "ghnId": "42d89e32-f253-4a20-8110-82eaad7cfeda"
  53. }

The 'SIS/GEOTK' plugin extracts metadata information from netcdf files by exploiting [apache/sis library features and publishes ISO metadata entries in GeoNetwork.

Invocation details
  • ID : "SIS/GEOTK"

Parameters List :

  • "GEONETWORK_CATEGORY" : [String value] GeoNetwork category for publiehd metadata. Default is 'Dataset';
  • "GEONETWORK_STYLESHEET" : [String value] GeoNetwork stylesheet for publiehd metadata. Default is '_none_';
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,new PluginInvocation("SIS/GEOTK"));

The 'REGISTER CATALOG' plugin modifies Thredds' main catalog.xml file in order to add/update a reference to the transferred catalog file.

Invocation details

Parameters List :

  • "CATALOG_REFERENCE" : [String value] The reference title to be set under catalog.xml which will link to the transferred catalog file
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
Destination dest=new Destination();
dest.setDestinationFileName(reference.replace(" ", "_")+".xml");
PluginInvocation invocation=new PluginInvocation("REGISTER_CATALOG");
invocation.setParameters(Collections.singletonMap("CATALOG_REFERENCE", reference));
client.localFile(catalogFile, dest,invocation);