Data Transfer Scheduler

From Gcube Wiki
Revision as of 14:08, 10 August 2012 by Nikolaos.drakopoulos (Talk | contribs) (Internal Operations)

Jump to: navigation, search

Data Transfer Scheduler Service

The Data Transfer Scheduler Service is responsible for the transfer scheduling activity delegating and spawning the transfer logic to the series of gDT Agent deployed on the infrastructure. It relies on Messaging to consume transfer results from gDT Agent.

Architecture

In detail the main service consists of one gCore statefull service (singleton) and one gCore stateless service.

  • The statefull service has the factory porttype so that the submitting API of the Data Transfer Scheduler Library can access it and every WS Resource that the factory creates for a different transfer-operation has one statefull porttype.

factory porttype: It constitutes the porttype that checks for the client if it's been already registered or not. If the client has been already registered, it just returns the EndpointReferenceType for its resource; in other case it makes a new key in order to create a unique resource for it. The operation is called checkIn, it takes a unique id (maybe name) as an input value and returns an EndpointReferenceType.

statefull porttype: It is the main porttype of service because here the client comes in order to pass the info for the scheduling. There is an operation called storeInfoScheduler which takes a StoreInfoSchedulerMessage type (having all the appropriate info for a scheduled transfer such as source, destination, type of transfer etc) as input and returns nothing. This operation is responsible for storing this amount of information to the local DB.

Statefull Service
  • The stateless service has the stateless porttype so that the management API of the Data Transfer Scheduler Library can get and set information about the users, the scopes, the files and what else the client wants to manage.
Stateless Service

Both of the Statefull and Stateless services use the the Data Transfer Scheduler Database Library so as to access the local DB, each one for a different purpose. The first one stores the details of the transfer scheduling and the second one stores or get info about the local gHN.

The following figure indicates the complete parts of the Data Transfer Scheduler Service:

Data Transfer Scheduler Service

Types of Transfer

About the different types of transfer, there are the same types of transfer as the ones in the Agent Service. In detail the client can set a transfer between these types:

  • LocalFileBasedTransfer
  • FileBasedTransfer
  • TreeBasedTransfer (not yet)

The first case is when the client wants to transfer a file or many files from its node to the node that the Agent Service is running.

When the client needs to transfer files from a number of given urls to the Agent's node, he may use the second type of transfer. In addition there is the option that these files can be transfered not to the Agent's node but to a specific (info given from the IS) Datastorage (Not yet).

Types of Schedule

There are three types of schedule:

  • Direct Transfer
  • Manually Scheduled
  • Periodically Scheduled

In the Direct Transfer the client can simply submit a transfer without any schedule. The service directly through the agent library makes the transfer happen.

In the Manually Scheduled Transfer the client submit a transfer by providing also the date that he wants to start the transfer. More specifically the given date should be only a specific instance and the transfer will take place only once.

The actual schedule exists in the Periodically Scheduled Transfer where the client sets the period that he wants the transfer to take place. He can choose one of the six given options: every minute/hour/day/week/month/year. At this type of schedule client should also give the start date of the schedule. This is a specific instance like in the manually scheduled transfer and constitutes the beggining of the scheduled transfer.

The Different Types Of Schedule

Transfer Characteristics

  • Each transfer is able to have more than one transfer objects.
  • The status point of every transfer is reloaded at each stage of transfer.
  • The several status points are : STANDBY (the transfer has not started yet), ONGOING, CANCELED, FAILED, COMPLETED
  • Depends on what type of schedule the transfer has got, it may or may not change its status again when it is finished. In details in case of having a periodically scheduled transfer, the transfer needs to take place every time that is written in the type schedule. Consequently each time that the transfer has been completed or failed and has taken this value of status it is changed to STANDBY because it needs to take place again.

Main Service Operations

As regards the main service of the scheduler (stateful) there are these main operations so far:

  • storeInfoScheduler
  • cancelScheduledTransfer
  • monitorScheduledTransfer
  • getScheduledTransferOutcomes

The first one constitutes the major method which is responsible for retrieving the string message, converting it to the apropriate objects and storing the specific transfer including its schedule to the DB. The return value is the transfer identifier in the Scheduler DB.

The second operation is responsible for canceling a scheduled transfer. It cancels a transfer having firstly check the type of schedule and the status of the transfer. Its behavior differs from case to case. For example if the status is STANDBY it does not need to connect to the Agent because the transfer has not started yet. In this case it just changes the transfer's status from STANDBY to CANCELED.

When a transfer takes place there is a point that the thread responsible for the transfer is waiting for the result of the transfer. More precicely at that point the relevant thread calls the monitor operation of the Agent Service and waits for its result. Consequently the monitoring is integrated inside the core of the service. Though, in case of a wanted monitoring from outside, the client can give the transfer identifier (received from the storeInfoScheduler operation) to the monitorScheduledTransfer and receive the monitor result of the scheduler. The behavior of this method is standard because it returns the stored (in the scheduler DB) status of the specific transfer.

The last operation is responsible for retrieving the outcomes of a specific transfer. In case of calling this operation at a point that the transfer has not started yet a relevant message will be returned as the result of this operation.

NOTE: You cannot call one of the last three operations if the transfer type is LocalFileBasedTransfer because this is the only case of having a sync operation between Scheduler and Agent.

Internal Operations

Besides the above operations, there are also some other operations needed for the scheduler plan. These operations are internal and they are not for interacting with the clients.

  • CheckDBForTransfers

The first time that a new client connects to the factory and creates the resource, the factory also starts a thread called CheckDBForTransfers which is responsible for checking the DB each time interval about any transfers from the same submitter that need to start. If a transfer need to take place, the CheckDBForTransfers starts another thread called TransferHandler and pass the current info to it in order to manage this specific transfer.

  • CheckIS

The service need to know about the existed agents and storages in the infrastructure. This is the reason why there is a need for a continuously check at Information System and update in the DB. At the point that the Scheduler Service starts to run, it commences a new thread named CheckIS which has been created for the previous need. It checks the IS each time interval we give and it also updates the DB about IS.

Internal Operations Of Scheduler Service

Data Transfer Scheduler Library

Architecture

The Data Transfer Scheduler Library is the Client Library implementing the API for Data Transfer Scheduling. In particular it consists of two separate API’s, one for the management and one for the submitting.

  • The management API is responsible for:
    • Retrieving information about the users of the node that the Scheduler service is running to.
    • Setting permissions to these users about the transfers.
    • Storing and retrieving information about the scopes.
    • …(TODO)
  • The submitting API is responsible for submitting a specific or several transfer operations by providing the different parameters (source, destination, file, type of transfer, date to transfer etc…) from XML (or JSON) files.

The Library implements Asynchronous operations for the data transfer scheduling.

Data Transfer Scheduler Lib

Data Transfer Scheduler Database Library

The Data Transfer Scheduler Database Library implements the API so that the Data Transfer Scheduler Service (either the statefull one or the stateless) can access the database. The main structure of the DB without any details can be shown at the following picture.

The Main Structure Of DB

Each transfer entity is identified by a unique transfer id which is different from the transfer id that the Agent Service may keep for this transfer. Besides the unique id it keeps the following information:

  • submitter (The one who submit the transfer)
  • sourceId (The id of the DataSource)
  • storageId (The id of the DataStorage)
  • agentId (The id of the Agent)
  • status (The status point)
  • objectTrasferredIDs (The Ids of the transfer objects been transferred succesful)
  • objectFailedIDs (The Ids of the transfer objects not been transferred because of a failure)
  • transferError (The error occured in case of a failure)
  • transferIdOfAgent (The transfer identifier that the Agent keeps for this transfer )
  • typeOfScheduleId (The id of the TypeOfSchedule entity)

As we can see there is no info inside the transfer entity about the objects that need to be transfered. This is because the transfer objects keep the id of the specific transfer and not the opposite.

Data Transfer Scheduler IS Library

The Data Transfer Scheduler IS Library implements the API so that the Data Transfer Scheduler Service can retrieve needed info about the Information System and store them in the Database. The already existed methods provide info about the agents.