Difference between revisions of "How to use Data Transfer 2"

From Gcube Wiki
Jump to: navigation, search
(Data Transfer Plugins)
(PLUGIN INFO OUTPUT)
 
(16 intermediate revisions by the same user not shown)
Line 163: Line 163:
 
*'''parameters''' (Map<String,String>)
 
*'''parameters''' (Map<String,String>)
 
: map of ''parameter-name'' -> ''parameter-value'' to be used in plugin invocations. Please use the static member PluginInvocation.DESTINATION_FILE_PATH as parameter value, for those parameters that need the actual destination's absolute path;
 
: map of ''parameter-name'' -> ''parameter-value'' to be used in plugin invocations. Please use the static member PluginInvocation.DESTINATION_FILE_PATH as parameter value, for those parameters that need the actual destination's absolute path;
 +
 +
==REST Invocations==
 +
 +
;From gCube 4.9.0 the <TransferMethod> option has been removed from the PATH and will be handled as the query parameter "method" (default value "FileUpload")
 +
 +
The service offers a REST interface for simple transfer requests / handling in the following format :
 +
 +
;<DATA-TRANSFER-BASE-URL>/REST/<TransferMethod>/<DESTINATION_ID>/<SUB_PATH>
 +
 +
The following query parameters can be specified :
 +
*''destination-file-name''
 +
*''create-dirs'' [Default : false]
 +
*''on-existing-file'' [Default : ADD_SUFFIX]
 +
*''on-existing-dir'' [Default : APPEND]
 +
 +
The following ''FORM DATA'' parameters can also be used :
 +
*''uploadedFile'' : the file uploaded by the client
 +
*''plugin-invocations'' : JSON representation of plugin invocation set
 +
 +
=== THREDDS upload and metadata publication via cURL ===
 +
The following cURL command has the following behaviour :
 +
* 1. Uploads the file to "thredds" destination, subfolder "public/netcdf/myCatalog"
 +
* 2. Invokes plugin "SIS/GEOTK"
 +
 +
<code>curl -F "uploadedFile=@/home/fabio/raster-1465493223336242.nc" --header "gcube-token:<GCUBE-TOKEN>"  http://thredds-d-d4s.d4science.org/data-transfer-service/gcube/service/REST/FileUpload/thredds/public/netcdf/myCatalog --form "plugin-invocations="SIS/GEOTK"" </code>
  
 
==Data Transfer Plugins==
 
==Data Transfer Plugins==
 
This section aims to describe implemented plugins in order to help developers exploit their functionalities.  
 
This section aims to describe implemented plugins in order to help developers exploit their functionalities.  
 
Plugins are modules that are optionally invoked after the transfer is complete. Plugin invocations are declared within the Transfer request, specifying a set of [PluginInvocation] instances.
 
Plugins are modules that are optionally invoked after the transfer is complete. Plugin invocations are declared within the Transfer request, specifying a set of [PluginInvocation] instances.
Following is a list of available plugins, with all details to invoke them.
+
Following sections list respectively :
 +
;[[#General Purpose Plugins]]; which are available on every ''SmartGears'' node.
 +
;[[#Specific Plugins]]; meant to address a particular installation.
  
===Decompress Archive Plugin===
+
===General Purpose Plugins===
The 'Decompress Archive' plugin extracts the content of an archive to a specified path.
+
This section describes general purposes plugin, which are included in '''default distributions'''. This means that these plugins are always available on a ''SmartGears'' node.
  
====Invocation details====
+
====Decompress Archive Plugin====
 +
The 'Decompress Archive' plugin extracts the content of an archive to a specified path. The implementing module (needed at service side) is
 +
<pre>
 +
<dependency>
 +
  <groupId>org.gcube.data.transfer</groupId>
 +
  <artifactId>decompress-archive-plugin</artifactId>
 +
</dependency>
 +
</pre>
 +
 
 +
=====Invocation details=====
 
*ID : "DECOMPRESS"
 
*ID : "DECOMPRESS"
  
 
Parameters List :  
 
Parameters List :  
*"SOURCE_ARCHIVE" : [String value] Absolute path of source archive file;
+
*"DESTINATION" : [String value] The folder destination of uncompressed content expressed as a path relative to ''SOURCE_ARCHIVE''. Default is same directory of ''SOURCE_ARCHIVE'';
*"DESTINATION" : [String value] The folder destination of uncompressed content expressed as a path relative to ''SOURCE_PARAMETER''. Default is same directory of ''SOURCE_PARAMETER'';
+
*"OVERWITE_DESTINATION" : [Boolean value] Set true in order to overwrite ''DESTINATION'' content. Default is '''false''';
*"OVERWITE_DESTINATION" : [Boolean value] Set true in order to overwrite ''DESTINATION_PARAMETER'' content. Default is '''false''';
+
*"DELETE_ARCHIVE" : [Boolean value] Set true in order to delete ''SOURCE_ARCHIVE'' after extracting content. Default is '''false''';
*"DELETE_ARCHIVE" : [Boolean value] Set true in order to delete ''SOURCE_PARAMETER'' after extracting content. Default is '''false''';
+
  
====Invocation example====
+
=====Invocation example=====
 
<source lang="java5">
 
<source lang="java5">
 
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
 
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
Line 189: Line 224:
 
Map<String,String> params=new HashMap<>();
 
Map<String,String> params=new HashMap<>();
 
params.put("DESTINATION", "myFolder");
 
params.put("DESTINATION", "myFolder");
params.put("SOURCE_PARAMETER", PluginInvocation.DESTINATION_FILE_PATH);
+
params.put("SOURCE_ARCHIVE", PluginInvocation.DESTINATION_FILE_PATH);
  
 
Destination dest=new Destination(transferredFileName);
 
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> sinlgeton(new PluginInvocation("DECOMPRESS",params)));
+
client.localFile(localFile,dest,Collections.<PluginInvocation> singleton(new PluginInvocation("DECOMPRESS",params)));
 +
</source>
 +
 
 +
===Specific Plugins===
 +
This section lists plugins modules designed to address a particular installation (typically the management of third party applications). They will be available only on certain installation nodes, depending on needs.
 +
 
 +
 
 +
====Thredds Plugin Suite====
 +
Thredds plugin suite contains a set of plugins aimed to manage a Thredds installation in a gCube infrastructure. The implementing module (needed at service side) is
 +
<pre>
 +
<dependency>
 +
  <groupId>org.gcube.data.transfer</groupId>
 +
  <artifactId>sis-geotk-plugin</artifactId>
 +
</dependency>
 +
</pre>
 +
 
 +
Following sections describe plugins exposed by this module.
 +
======THREDDS PLUGIN INFO OUTPUT======
 +
Each of the following plugins expose an info object of class 'org.gcube.data.transfer.model.plugins.thredds.ThreddsInfo'. Following is a serialized example of this object :
 +
<source lang="JavaScript" line>
 +
{
 +
  "hostname": "thredds-pre-d4s.d4science.org",
 +
  "localBasePath": "/data/content/thredds",
 +
  "instanceBaseUrl": "http://thredds-pre-d4s.d4science.org/thredds",
 +
  "catalog": {
 +
    "ID": null,
 +
    "catalogFile": "catalog.xml",
 +
    "title": null,
 +
    "name": null,
 +
    "declaredDataSetRoot": null,
 +
    "declaredDataSetScan": [
 +
      {
 +
        "name": "Thredds Root Catalog",
 +
        "path": "public/netcdf",
 +
        "location": "/data/content/thredds/public/netcdf/",
 +
        "ID": "Root-DatasetScan"
 +
      }
 +
    ],
 +
    "subCatalogs": {
 +
      "name": "Catalogs of Virtual Research Environments VRE",
 +
      "ID": "VRE_Catalogs",
 +
      "linkedCatalogs": [
 +
        {
 +
          "ID": "preprodVRECatalog",
 +
          "catalogFile": "preprodVRECatalog.xml",
 +
          "title": "preprodVRECatalog",
 +
          "name": "preprodVRECatalog",
 +
          "declaredDataSetRoot": {
 +
            "path": "preVRE_static",
 +
            "location": "/data/content/thredds/preVRE",
 +
            "count": 0
 +
          },
 +
          "declaredDataSetScan": [
 +
            {
 +
              "name": "preprodVRECatalog Catalog",
 +
              "path": "preVRE_dynamic",
 +
              "location": "/data/content/thredds/preVRE",
 +
              "ID": "preprodVRECatalog_in_preVRE"
 +
            }
 +
          ],
 +
          "subCatalogs": null
 +
        }
 +
      ]
 +
    }
 +
  },
 +
  "adminUser": ...,
 +
  "adminPassword": ...,
 +
  "version": 4,
 +
  "minor": 6,
 +
  "build": 0,
 +
  "revision": 9,
 +
  "ghnId": "42d89e32-f253-4a20-8110-82eaad7cfeda"
 +
}
 +
</source>
 +
 
 +
=====SIS/GEOTK Plugin=====
 +
The 'SIS/GEOTK' plugin extracts metadata information from netcdf files by exploiting [[http://sis.apache.org/ apache/sis] library features and publishes ISO metadata entries in GeoNetwork.
 +
======Invocation details======
 +
*ID : "SIS/GEOTK"
 +
 
 +
Parameters List :
 +
*"GEONETWORK_CATEGORY" : [String value] GeoNetwork category for publiehd metadata. Default is 'Dataset';
 +
*"GEONETWORK_STYLESHEET" : [String value] GeoNetwork stylesheet for publiehd metadata. Default is '_none_';
 +
 
 +
======Invocation example======
 +
<source lang="java5">
 +
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
 +
String localFile="..";
 +
String transferredFileName="..";
 +
 
 +
Destination dest=new Destination(transferredFileName);
 +
client.localFile(localFile,dest,new PluginInvocation("SIS/GEOTK"));
 +
</source>
 +
 
 +
=====REGISTER CATALOG Plugin=====
 +
The 'REGISTER CATALOG' plugin modifies Thredds' main ''catalog.xml'' file in order to add/update a reference to the transferred catalog file.
 +
======Invocation details======
 +
*ID : "REGISTER_CATALOG"
 +
 
 +
Parameters List :
 +
*"CATALOG_REFERENCE" : [String value] The reference title to be set under catalog.xml which will link to the transferred catalog file
 +
======Invocation example======
 +
<source lang="java5">
 +
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
 +
Destination dest=new Destination();
 +
dest.setPersistenceId("thredds");
 +
dest.setDestinationFileName(reference.replace(" ", "_")+".xml");
 +
dest.setOnExistingFileName(DestinationClashPolicy.REWRITE);
 +
 +
PluginInvocation invocation=new PluginInvocation("REGISTER_CATALOG");
 +
invocation.setParameters(Collections.singletonMap("CATALOG_REFERENCE", reference));
 +
 
 +
client.localFile(catalogFile, dest,invocation);
 
</source>
 
</source>

Latest revision as of 12:47, 19 December 2017

Data Transfer 2 is one of the subsystems forming the gCube Data_Transfer_Facilities. It aims to provide gCube Applications a common layer for efficient and transparent data transfer towards gCube SmartGear nodes. It's designed as a client service architecture exploiting plugin design pattern. A generic overview and its design are described here

Following sections describe how to use and interact with the involved components.

Data Transfer Service

The Data Transfer Service is a SmartGears-aware web application developped on top of [jersey] framework. Its main functionalities are :

  • receive and serve data transfer requests;
  • expose capabilities;

At startup it gathers information on :

  • current network configuration (i.e. exposed hostname, available ports) in order to negotiate transfer channel with clients;
  • available data-transfer plugins

Installation

The data transfer service is released as a war with the following maven coordinates

  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>data-transfer-service</artifactId>

It needs to be hosted in a SmartGears installation in order to run. Please refer to SmartGears for further information.

Interface

In this section we will describe the http interfaces exposed by the service.

Capabilities

The Capabilities interface exposes information regarding :

  • Instance details (i.e. hostname, port, nodeId)
  • Available plugins
  • Available persistence Ids

Capabilities are mapped in a Java Object of class org.gcube.data.transfer.model.TransferCapabilities.

Transfer requests

The Requests interface receives transfer requests from clients, returning the associated ticket ID if the requests has been successfully registered. E request is expected to specify :

  • The transfer settings decided by the caller client (including the data source);
  • The transfer destination (see #Transfer destination);
  • An optional set of plugin invocations (see #Plugin invocation)


Transfer requests are mapped in Java Objects of class org.gcube.data.transfer.model.TransferRequest.

Transfer status

The Status interface provides information on the progress of the transfer identified by its related ticket ID. The transfer status provides information about :

  • The related transfer request;
  • Transfer statistics (i.e. transferredBytes, elapsed Time);
  • Destination file absolute location;
  • Overall status;
  • Error Message if any;

Data Transfer library

The data transfer library is a java library which serves applications as a client to data transfer facilities. In order to use the library, applications must declare the following dependency in their maven pom files :

<dependency>
  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>data-transfer-library</artifactId>
</dependency>


The library is designed in order to offer a simple api to submit transfers to the selected services without dealing with :

  • http calls;
  • status monitoring;
  • transfer channel selection negotiation according to server's capabilities;

Submit a transfer

In order to submit a transfer to a chosen server, the application needs to get an instance of the class org.gcube.data.transfer.library.DataTransferClient. Instances of the client are obtained by calling on of the following static methods :

public static DataTransferClient getInstanceByEndpoint(String endpoint) throws UnreachableNodeException, ServiceNotFoundException;
 
public static DataTransferClient getInstanceByNodeId(String id) throws HostingNodeNotFoundException, UnreachableNodeException, ServiceNotFoundException;

To perform a transfer operation, application just need to invoke one of the exposed methods providing :

  • a transfer source (i.e. a java.io.File object or its absolute path);
  • a transfer destination a.k.a file destination name for the basic scenario (see #Transfer destination for more in-depth details);
  • optional set of Plugin invocations (see #Plugin invovation for more in-depth details).

Please note the library exposes different signature of the same logic in order to mask unwanted functionalities to clients i.e. the following three calls perform the same operation :

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
client.localFile(localFile,transferredFileName);

Using object org.gcube.data.transfer.model.Destination (see #Transfer destination for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest);

Using object org.gcube.data.transfer.model.PluginInvocation (see #Plugin invovation for more in-depth details).

DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> emptySet());

Transfer destination

For each transfer operation, clients are required to declare a destination definition using objects of the class org.gcube.data.transfer.model.Destination. Destination definitions include the following parameters :

  • destination file name (String)
the name that will be used for the transferred file in the remote service file system;
  • onExistingFileName (org.gcube.data.transfer.model.DestinationClashPolicy) [default value = ADD_SUFFIX]
declares the policy to follow in case the specified destination file name already exists in the declared location(see #Destination Clash Policies for further information);
  • persistence id (String) [default value = Destination.DEFAULT_PERSISTENCE_ID]
the persistence folder on the service runtime environment, identified by the target's application context name (see SmartGears for further information). Clients can use service capabilities in order to gather information on available context ids (See #Capabilities for further information). To use the default value (which identifies the data-transfer-service itself), clients can use the static member Destination.DEFAULT_PERSISTENCE_ID;
  • subFolder (String) [default value = null]
declare a destination sub-path starting from selected persistence folder;
  • createSubFolders (Boolean) [default value=false]
tells the service if it must consider or not the subFolder option;
  • onExistingSubFolder org.gcube.data.transfer.model.DestinationClashPolicy [default value = APPEND]
declares the policy to follow in case the specified destination subFolder already exists in the declared persistence folder (see #Destination Clash Policies for further information);


Destination Clash Policies

The enum class org.gcube.data.transfer.model.DestinationClashPolicy represents the available policies in case of file system clashes on server-side. Following is the set of supported clash policies and a brief description :

FAIL
abort the transfer;
REWRITE
overwrite destination by previously deleting the existent one;
ADD_SUFFIX
adds a bracket-isolated counter at the end of the clashing name (i.e. myFileName becomes myFileName(1));
APPEND
adds the transferred content to the existing one.

Plugin invocation

Plugin invocations are declared by using instances of the class org.gcube.data.transfer.model.PluginInvocation.

These objects are formed by the following members :

  • pluginId (String)
the id of the installed plugin. Available plugins are listed in the server capabilities (see#Capabilities for more information);
  • parameters (Map<String,String>)
map of parameter-name -> parameter-value to be used in plugin invocations. Please use the static member PluginInvocation.DESTINATION_FILE_PATH as parameter value, for those parameters that need the actual destination's absolute path;

REST Invocations

From gCube 4.9.0 the <TransferMethod> option has been removed from the PATH and will be handled as the query parameter "method" (default value "FileUpload")

The service offers a REST interface for simple transfer requests / handling in the following format :

<DATA-TRANSFER-BASE-URL>/REST/<TransferMethod>/<DESTINATION_ID>/<SUB_PATH>

The following query parameters can be specified :

  • destination-file-name
  • create-dirs [Default : false]
  • on-existing-file [Default : ADD_SUFFIX]
  • on-existing-dir [Default : APPEND]

The following FORM DATA parameters can also be used :

  • uploadedFile : the file uploaded by the client
  • plugin-invocations : JSON representation of plugin invocation set

THREDDS upload and metadata publication via cURL

The following cURL command has the following behaviour :

  • 1. Uploads the file to "thredds" destination, subfolder "public/netcdf/myCatalog"
  • 2. Invokes plugin "SIS/GEOTK"

curl -F "uploadedFile=@/home/fabio/raster-1465493223336242.nc" --header "gcube-token:<GCUBE-TOKEN>" http://thredds-d-d4s.d4science.org/data-transfer-service/gcube/service/REST/FileUpload/thredds/public/netcdf/myCatalog --form "plugin-invocations="SIS/GEOTK""

Data Transfer Plugins

This section aims to describe implemented plugins in order to help developers exploit their functionalities. Plugins are modules that are optionally invoked after the transfer is complete. Plugin invocations are declared within the Transfer request, specifying a set of [PluginInvocation] instances. Following sections list respectively :

#General Purpose Plugins; which are available on every SmartGears node.
#Specific Plugins; meant to address a particular installation.

General Purpose Plugins

This section describes general purposes plugin, which are included in default distributions. This means that these plugins are always available on a SmartGears node.

Decompress Archive Plugin

The 'Decompress Archive' plugin extracts the content of an archive to a specified path. The implementing module (needed at service side) is

<dependency>
  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>decompress-archive-plugin</artifactId>
</dependency>
Invocation details
  • ID : "DECOMPRESS"

Parameters List :

  • "DESTINATION" : [String value] The folder destination of uncompressed content expressed as a path relative to SOURCE_ARCHIVE. Default is same directory of SOURCE_ARCHIVE;
  • "OVERWITE_DESTINATION" : [Boolean value] Set true in order to overwrite DESTINATION content. Default is false;
  • "DELETE_ARCHIVE" : [Boolean value] Set true in order to delete SOURCE_ARCHIVE after extracting content. Default is false;
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
 
Map<String,String> params=new HashMap<>();
params.put("DESTINATION", "myFolder");
params.put("SOURCE_ARCHIVE", PluginInvocation.DESTINATION_FILE_PATH);
 
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,Collections.<PluginInvocation> singleton(new PluginInvocation("DECOMPRESS",params)));

Specific Plugins

This section lists plugins modules designed to address a particular installation (typically the management of third party applications). They will be available only on certain installation nodes, depending on needs.


Thredds Plugin Suite

Thredds plugin suite contains a set of plugins aimed to manage a Thredds installation in a gCube infrastructure. The implementing module (needed at service side) is

<dependency>
  <groupId>org.gcube.data.transfer</groupId>
  <artifactId>sis-geotk-plugin</artifactId>
</dependency>

Following sections describe plugins exposed by this module.

THREDDS PLUGIN INFO OUTPUT

Each of the following plugins expose an info object of class 'org.gcube.data.transfer.model.plugins.thredds.ThreddsInfo'. Following is a serialized example of this object :

  1. {
  2.   "hostname": "thredds-pre-d4s.d4science.org",
  3.   "localBasePath": "/data/content/thredds",
  4.   "instanceBaseUrl": "http://thredds-pre-d4s.d4science.org/thredds",
  5.   "catalog": {
  6.     "ID": null,
  7.     "catalogFile": "catalog.xml",
  8.     "title": null,
  9.     "name": null,
  10.     "declaredDataSetRoot": null,
  11.     "declaredDataSetScan": [
  12.       {
  13.         "name": "Thredds Root Catalog",
  14.         "path": "public/netcdf",
  15.         "location": "/data/content/thredds/public/netcdf/",
  16.         "ID": "Root-DatasetScan"
  17.       }
  18.     ],
  19.     "subCatalogs": {
  20.       "name": "Catalogs of Virtual Research Environments VRE",
  21.       "ID": "VRE_Catalogs",
  22.       "linkedCatalogs": [
  23.         {
  24.           "ID": "preprodVRECatalog",
  25.           "catalogFile": "preprodVRECatalog.xml",
  26.           "title": "preprodVRECatalog",
  27.           "name": "preprodVRECatalog",
  28.           "declaredDataSetRoot": {
  29.             "path": "preVRE_static",
  30.             "location": "/data/content/thredds/preVRE",
  31.             "count": 0
  32.           },
  33.           "declaredDataSetScan": [
  34.             {
  35.               "name": "preprodVRECatalog Catalog",
  36.               "path": "preVRE_dynamic",
  37.               "location": "/data/content/thredds/preVRE",
  38.               "ID": "preprodVRECatalog_in_preVRE"
  39.             }
  40.           ],
  41.           "subCatalogs": null
  42.         }
  43.       ]
  44.     }
  45.   },
  46.   "adminUser": ...,
  47.   "adminPassword": ...,
  48.   "version": 4,
  49.   "minor": 6,
  50.   "build": 0,
  51.   "revision": 9,
  52.   "ghnId": "42d89e32-f253-4a20-8110-82eaad7cfeda"
  53. }
SIS/GEOTK Plugin

The 'SIS/GEOTK' plugin extracts metadata information from netcdf files by exploiting [apache/sis library features and publishes ISO metadata entries in GeoNetwork.

Invocation details
  • ID : "SIS/GEOTK"

Parameters List :

  • "GEONETWORK_CATEGORY" : [String value] GeoNetwork category for publiehd metadata. Default is 'Dataset';
  • "GEONETWORK_STYLESHEET" : [String value] GeoNetwork stylesheet for publiehd metadata. Default is '_none_';
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
String localFile="..";
String transferredFileName="..";
 
Destination dest=new Destination(transferredFileName);
client.localFile(localFile,dest,new PluginInvocation("SIS/GEOTK"));
REGISTER CATALOG Plugin

The 'REGISTER CATALOG' plugin modifies Thredds' main catalog.xml file in order to add/update a reference to the transferred catalog file.

Invocation details
  • ID : "REGISTER_CATALOG"

Parameters List :

  • "CATALOG_REFERENCE" : [String value] The reference title to be set under catalog.xml which will link to the transferred catalog file
Invocation example
DataTransferClient client=DataTransferClient.getInstanceByEndpoint(...);
Destination dest=new Destination();
dest.setPersistenceId("thredds");
dest.setDestinationFileName(reference.replace(" ", "_")+".xml");
dest.setOnExistingFileName(DestinationClashPolicy.REWRITE);
 
PluginInvocation invocation=new PluginInvocation("REGISTER_CATALOG");
invocation.setParameters(Collections.singletonMap("CATALOG_REFERENCE", reference));
 
client.localFile(catalogFile, dest,invocation);