Data Transfer Scheduler

From Gcube Wiki
Revision as of 11:08, 10 August 2012 by Nikolaos.drakopoulos (Talk | contribs) (Main Operations)

Jump to: navigation, search

Data Transfer Scheduler Service

The Data Transfer Scheduler Service is responsible for the transfer scheduling activity delegating and spawning the transfer logic to the series of gDT Agent deployed on the infrastructure. It relies on Messaging to consume transfer results from gDT Agent.

Design

In detail the main service consists of one gCore statefull service (singleton) and one gCore stateless service.

  • The statefull service has the factory porttype so that the submitting API of the Data Transfer Scheduler Library can access it and every WS Resource that the factory creates for a different transfer-operation has one statefull porttype.

factory porttype: It constitutes the porttype that checks for the client if it's been already registered or not. If the client has been already registered, it just returns the EndpointReferenceType for its resource; in other case it makes a new key in order to create a unique resource for it. The operation is called checkIn, it takes a unique id (maybe name) as an input value and returns an EndpointReferenceType.

statefull porttype: It is the main porttype of service because here the client comes in order to pass the info for the scheduling. There is an operation called storeInfoScheduler which takes a StoreInfoSchedulerMessage type (having all the appropriate info for a scheduled transfer such as source, destination, type of transfer etc) as input and returns nothing. This operation is responsible for storing this amount of information to the local DB.

Statefull Service
  • The stateless service has the stateless porttype so that the management API of the Data Transfer Scheduler Library can get and set information about the users, the scopes, the files and what else the client wants to manage.
Stateless Service

Both of the Statefull and Stateless services use the the Data Transfer Scheduler Database Library so as to access the local DB, each one for a different purpose. The first one stores the details of the transfer scheduling and the second one stores or get info about the local gHN.

The following figure indicates the complete parts of the Data Transfer Scheduler Service:

Data Transfer Scheduler Service

Types of Transfer

About the different types of transfer, there are the same types of transfer as exist in the Agent Service. In detail the client can set a transfer between these types:

  • LocalFileBasedTransfer
  • FileBasedTransfer
  • TreeBasedTransfer (not yet)

The first case is when the client wants to transfer a file or many files from its node to the node that the Agent Service is running.

When the client needs to transfer files from a number of given urls to the Agent's node, he may use the second type of transfer. In addition there is the option that these files not to be transfered to the Agent's node but to be transfered to a specific (info given from the IS) Datastorage (Not yet).

Types of Schedule

There are three types of schedule:

  • Direct Transfer
  • Manually Scheduled
  • Periodically Scheduled

In the Direct Transfer the client can simply submit a transfer without any schedule. The service directly through the agent library makes the transfer happen.

In the Manually Scheduled Transfer the client submit a transfer by providing also the date that he wants to start the transfer. More specifically the given date should be only a specific instance and the transfer will take place only once.

The actual schedule exists in the Periodically Scheduled Transfer where the client sets the period that he wants the transfer to take place. He can choose one of the six given options: every minute/hour/day/week/month/year. At this type of schedule client should also give the start date of the schedule. This is a specific instance like in the manually scheduled transfer and constitutes the beggining of the scheduled transfer.

The Different Types Of Schedule

Main Operations

As regards the main service of the scheduler (stateful) there are these main operations so far:

  • storeInfoScheduler
  • cancelScheduledTransfer
  • monitorScheduledTransfer
  • getScheduledTransferOutcomes

The first one constitutes the major method which is responsible for retrieving the string message, converting it to the apropriate objects and storing the specific transfer including its schedule to the DB. The return value is the transfer identifier in the Scheduler DB.

The second operation is responsible for canceling a scheduled transfer. It cancels a transfer having firstly check the type of schedule and the status of the transfer. Its behavior differs from case to case. For example if the status is STANDBY it does not need to connect to the Agent because the transfer has not started yet. In this case it just changes the transfer's status from STANDBY to CANCELED.

When a transfer takes place there is a point that the thread responsible for the transfer is waiting for the result of the transfer. More precicely at that point the relevant thread calls the monitor operation of the Agent Service and waits for its result. Consequently the monitoring is integrated inside the core of the service. Though, in case of a wanted monitoring from outside, the client can give the transfer identifier (received from the storeInfoScheduler operation) to the monitorScheduledTransfer and receive the monitor result of the scheduler. The behavior of this method is standard because it returns the stored (in the scheduler DB) status of the specific transfer.

The last operation is responsible for retrieving the outcomes of a specific transfer. In case of calling this operation at a point that the transfer has not started yet a relevant message will be returned as the result of this operation.

NOTE: You cannot call one of the last three operations if the transfer type is LocalFileBasedTransfer because this is the only case of having a sync operation between Scheduler and Agent.

Data Transfer Scheduler Library

Design

The Data Transfer Scheduler Library is the Client Library implementing the API for Data Transfer Scheduling. In particular it consists of two separate API’s, one for the management and one for the submitting.

  • The management API is responsible for:
    • Retrieving information about the users of the node that the Scheduler service is running to.
    • Setting permissions to these users about the transfers.
    • Storing and retrieving information about the scopes.
    • …(TODO)
  • The submitting API is responsible for submitting a specific or several transfer operations by providing the different parameters (source, destination, file, type of transfer, date to transfer etc…) from XML (or JSON) files.

The Library implements Asynchronous operations for the data transfer scheduling.

Data Transfer Scheduler Lib

Data Transfer Scheduler Database Library

The Data Transfer Scheduler Database Library implements the API so that the Data Transfer Scheduler Service (either the statefull one or the stateless) can access the database.

Data Transfer Scheduler DB Lib

Data Transfer Scheduler IS Library

The Data Transfer Scheduler IS Library implements the API so that the Data Transfer Scheduler Service can retrieve needed info about the Information System and store them in the Database. The already existed methods provide info about the agents.