GRS2

From Gcube Wiki
Revision as of 13:06, 10 May 2010 by Giorgos.papanikos (Talk | contribs) (Created page with '== gRS == === Introduction === The goal of the gRS framework is to enable point to point producer consumer communication. For this to be achieved and connect collocated or remote…')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

gRS

Introduction

The goal of the gRS framework is to enable point to point producer consumer communication. For this to be achieved and connect collocated or remotely located actors the framework protects the two parties from technology specific knowledge and limitations. Additionally it can provide upon request value adding functionalities to enhance the clients communication capabilities.

In the process of offering these functionalities to its clients the framework imposes minimal restrictions on the clients design. Providing an interface similar to commonly used programming structures such as a collection and a stream, client programs can easily incorporate it.

Furthermore, since one of the gRS main goals is to provide location independent services, it is modularly build to allow a number of different technologies to be used as transport and communication mediation depending on the scenario used. For communication through trusted channels a clear TCP connection can be used for fast delivery. Authentication of communication parties is build in to the framework and can be utilized on communication packet level. In situations where more secure connections are required, the full communication stream can be fully encrypted. In cases of firewalled communication, an http transport mechanism can be easily incorporated and supplied by the framework without a single change in the clients code. In case of collocated actors, in memory references can be directly passed from producer to consumer without either of them changing their behavior in any of the described cases.

In a disjoint environment where the producer and consumer are not customly made to interact only with each other or even in the case that the needs of the consumer change from invocation to invocation, a situation might appear where the producer generates more data than the consumer needs even at a per record scope. The gRS enables fine grained transport mechanisms at level not just of a record, but also at the level of record field. A record containing more than one fields may be needed entirely by its consumer or at the next interaction only a single field of the record may be needed. In such a situation a consumer can retrieve only the payload he is interested in without having to deal with any additional payload. Even at the level of the single field, the transported data can be further optimized to consume only just the needed bytes instead of having to transfer a large amount of data which will at the end be disposed without being used.

In modern systems the situation frequently appears where the producers output is already available for access through some protocol. The gRS has a very extendable design as to the way the producer will serve its payload. A number of protocols can be used, ftp, http, network storage, local filesystem, and many more can be implemented very easily and modularly.

In all producer consumer scenarios the common delimiter is always the synchronization of the two. The gRS handles this synchronization transparently to the clients who can either use the framework's mechanisms to block waiting for desired circumstances or event to be notified when some criteria is met. The whole design of the framework is event based. It is possible to have a producer consumer scenario basing all the interactions of the two clients solely on events that the two exchange attaching custom objects and information. Additionally to making the framework more versatile at its core, events also allow to have a two way flow of communication not only for control events, but also as feedback from the consumer to the producer and vise versa, external to the usual control signals in common producer consumer scenarios.

Overview

One can distinguish between two common cases in gRS usage as will be detailed in later sections. The following two overviews depict this two cases.

[[Image:]]


[[Image:]]


Records and Record Pools

Record Pool (gr.uoa.di.madgik.grs.pool)

The main data store of the gRS framework and main synchronization point is the record pool. Any record pool implementation must conform to a specific interface. Different implementations of the record pool may use different technologies to store their payload. Currently there is one available implementation that keeps all the records stored in an in memory list. The records are consumed in the same order they are produced although consumers can choose to consume them in a random order. Additionally to the list based storage of the records, a map based interface is also provided for easy access of records based on their ids.

A record pool is associated with a number of management observing, monitoring and facilitating a number of operations that can be performed on the pool.

  • DiskRecordBuffer – A disk buffer that can be used as temporary storage for records and fields as will be explained in later sections.
  • HibernationObserver – An observer of events of specific type that manage temporal storage and retrieval from a disk buffer. More will be explained in later sections.
  • DefinitionManager – A common container class that manages definitions of records and fields associated with the pool as will be explained in later sections.
  • PoolState – A class holding events that can be emitted by the pool or forwarded through the pool. More will be detailed in later sections.
  • PoolStatistics – A class holding and updating usage statistics of the record pool as will be explained in later sections.
  • LifeCycleManager – Policies that govern the lifetime of the record pool can be defined and implemented through this class as will be detailed in later sections.

The pool instantiation is handled internally by the framework and is done through a set of configuration parameters. This parameters include the following information.

  • Transport directive – Whenever a pool is transported to a remote location, a directive can be set to describe the transport method preferred for the pool's items. This directive will be used for all its records and their fields unless some record or field has specifically requested some other type of preferred transport. The available directives are full and partial. The former implies that all the records and fields payload will be immediately made available to the remote site once they will be transferred. The later will not send any payload initially for any of the fields unless the field attributes specify that some prefetching is to be done.
  • Pool Type – The pool type property indicates the type of pool to be instantiated. Since the pool instantiation is done internally through the framework the available pools must be known to the framework to perform their instantiation and handle differently cases that may require different parametrization or configuration. The only currently available pool type is the memory list pool.
  • Pool Synchronization – When a record pool is consumed remotely to avoid continuous unnecessary communication between remote locations, a synchronization interval can be set. This interval is used by the synchronization protocol as will be explained in later sections. Depending on the protocol used this interval can be used or not based on the protocols implementation and synchronization capabilities. The synchronization hint will be available for the protocols that need to use it.
  • Incomplete Pool Synchronization factor – During synchronization of remote pools situations will arise where one of the two parties will be requesting information that the other party will not be able to respond immediately to. In such cases one may need to alter the synchronization interval hint to require more frequent or less frequent communication. This is performed through the factor provided through this parameter.

There are two cases that can be distinguished for a record pool consumption. The first is the degenerated one where the producer and consumer are collocated and within the boundaries of VM. This means that they have a shared memory address space and references to shared classes can be used. In this situation for optimization purposes the reference to the authored record pool can simply be passed to the consumer. The consumer sees directly the record pool the producer is authoring, accesses the references the producer creates and no marshaling is performed. The synchronization is performed directly through events producer and consumer emit and there is a single instance of all the pool's configuration and management classes. The second case is that of a different address space. This may imply the same physical machine but a different process, or a completely different host altogether. In that case the reference to the authored record pool cannot be shared and a new instance must be created. In this case a synchronization protocol stands between the two instances of the record pool and based on the produced events handles transparently the synchronization of the two instances. The two instances of the record pool are created with the same configuration parameters and therefor have exactly the same behavior. The mediation of the remote pool and their synchronization is detailed in depth in later sections.


[[Image:]]


Buffering (gr.uoa.di.madgik.grs.pool.buffering)

The gRS is meant to mediate between to sides. A producing and a consuming side. Either of the two sides can have inherent knowledge on their processing requirements and their production / consumption needs. A pr4oducer for example may know that in order to produce the next record he will require some processing that takes a curtain amount of time. A consumer may likewise know that the processing of its input will require some amount of time. In order to better optimize their communication they may deem necessary that some amount of data will always be there ready for the other party to have available. The Producer may request that at all times no matter how many records the Consumer has actually requested this far, there will be some amount of records produced by him to be readily available in case the consumer requests for them. The consumer respectively may want to specify that some number of records will already be available to his side whether or not he has already specifically requested them or not so that when he requests them he will not have to wait until they are transferred to him.

[[Image:]]


This is achieved though buffering policies that may be applicable to either, both or none of the two sides. Each one of the actors can specify the buffering policy it want to have applied. The framework will try to honor the actor's request by having always the number of records specified available for him in case of the Consumer, or already produced in case of the Producer. The Buffer policy is configurable through the BufferPolicyConfig configuration class. There are three configurable values the configuration class exposes.

  • Use buffering – Whether or not the client wants to have buffering enabled or not. If the buffering is disabled the producer will have to specifically request when and how many more items he wants to have delivered to him and wait until they are produced (depending on the Producer behavior) and then transferred to him (depending on the mediating proxy and the producer and consumer relative position)
  • Number of records to have buffered – The number of records the buffer policy will try to keep for each side. For the Writer side, this means that the buffer policy will emit items requested events to the client to always keep him ahead of the number of items the Reader has requested by the specified number. For the Reader side it means that the policy will request more items from the Writer than the number that the consumer has actually requested. Since while this process takes place the producer and consumer are also active, depending on the clients usage the buffering policy may not be able to be fully complied with. This means that at some point the number of items that the producer will have produced will be greater or less than what the Buffering policy needs, and / or that the consumer will have available more or less items than the ones that the policy needs. The policy will make a best effort attempt to stay within the bounds specified.
  • Flexibility Factor – Having an absolute number of items as a buffering policy attribute means that there might be cases that a lot of traffic can be produced from the Reader to the Writer and from the Writer to the Producer client. If for example the Reader has requested a buffering of 20 records ahead of what he is consuming, the moment he will request the next record, the request will be served by the buffered records and a request for a new item will be immediately send to the Writer. This means a new event for each reader consumption. Analogous examples can be made for the Writer's side. The flexibility factor allows for some softening of the policy. It can be specified that some percentage of the buffer can be allowed to be emptied without necessarily immediately requesting to be filled. For example a flexibility factor of 0.5 will let the buffer become half empty until an event is emitted to cover the half missing buffer.

Another interesting point is the one of event flooding. The buffer becomes half empty and an event is emitted. Before the needed items are supplied the consumer requests for another record. The buffering policy is consulted and decides that half+1 items are now needed. This process continues and until the initially requested items have been delivered, the Reader has already requested a large number of items quite possibly a great deal far from the number actually needed. To protect it self from this situation, whenever a Reader or a Writer is instructed by the BufferPolicy to request for some records, he temporarily suspends the buffer until the requested items have been made available. That way until then the client works with the number of items still available in the buffer without producing unnecessary requests until the initially requested number of records have arrived in which case the buffer will recheck what is now needed and may produce a new request event.


Definition Management (gr.uoa.di.madgik.grs.pool.defmanagment)

All records that are added to a record pool must always contain a definition. A record definition (RecordDef) contains information that enable the framework to handle the record in a uniform manner regardless of its actual implementation. The contents of a record definition is discussed in a different section. Similarly, every field that may be contained in a record must also contain a field definition (FieldDef) for the same purpose.

The common case as expected is that most record pools will contain largely similar in their definition records. Similarly each record is expected largely similar field in their definition. Having even though each record and each field of a record define their own definitions is an obvious waste of resources and an overhead that can be easily dealt with. Each record pool contains a DefinitionManager. This class is the placeholder of all definitions of records and fields within that pool.

[[Image:]]

Record definitions and field definitions can be added and removed from the manager. Every time a definition is added to the manager, the existing definitions are checked. If the definition is found to be already contained in the manager's collection, a reference counter for this definition is incremented and a reference to this already stored definition is returned to the caller. This reference is then stored in the respective record or field thereby sharing the definition with all other elements that have the same definition. In case a definition must be disposed, instead of destroying the actual definition the operation is passed to the manager which will locate the referenced definition and decrease its reference counter. In case the reference counter has reached 0, the definition can be disposed.

This approach although it serves its purpose very well and very efficiently, has two implications. Firstly, after a record has been added to a pool and its definition shared with the rest of the records available, its definition cannot be modified. This of course although it is a side affect is not really a limitation as this restrictions must be imposed anyway from the framework. After a record is added to a pool it is immediately available to the pool's consumers. Changing any of its properties does not necessarily mean that the consumer will be notified of this change as he might have already consumed it and moved on. The second implication is the need to be able to identify when two definitions are considered equal. Equality is handled at the level of the RecordDef and FieldDef classes and is simply a matter of identifying the distinct values of the definitions that make each definition unique.


Pool Events (gr.uoa.di.madgik.grs.pool.events)

The whole function of the gRS is event based. This package is the hart of the framework. The events declared here are the ones that define the flow of control governing the lifetime of a record pool, its creation, consumption, synchronization and destruction. Each record pool has associated a PoolState class in which the events that the pool can emit are defined and through which an interested party can register for the event. The PoolState keeps the events in a named collection through which a specific event can be requested.

The event model used is a derivative of the Observer – Observable pattern. Each one of the events declared in the package and stored in the PoolState event collection extends java Observable. Observers can register for state updates of the specific instance of the event. This of course means that there is a single point of update for each event which automatically raises a lot of issues of concurrency, synchronization and race conditions. So the framework uses the Observables for Observers to register with, but the actual state update is not carried through the Observabe the Observers have registered with, but rather as an instance of the same event that is passed as an argument in the respective update call. That way the originally stored Observable instances in the PoolState act as the registration point while a new instance of the Observable that is created by the emitting party is the one that actually carries the notification data.

All events extend the PoolStateEvent which in turns implements the Observable interface and the IProxySubPacket . The later is used during transport to serialize and deserialize an event and is explained in later sections. The events defined and are the bases of the framework are the following:

  • MessageToReaderEvent – This event is send with a defined recipient. It can be send by either some component of the framework to notify the consumer client of some condition, such as an error that the consumer may want to be notified for, or from the client of the writer, the producer him self. The message is composed of two parts. One is a simple string payload that can contain any kind of information the sender may want to include. The other part is an attachment. The sender of the object can attach an object implementing the IMarshlableObject interface and the message will be delivered to the consumer along with the attachment object as submitted.
  • MessageToWriterEvent – This event is send with a defined recipient. It can be send by either some component of the framework to notify the producer client of some condition, such as an error that the producer may want to be notified for, or from the client of the reader, the consumer him self. The message is composed of two parts. One is a simple string payload that can contain any kind of information the sender may want to include. The other part is an attachment. The sender of the object can attach an object implementing the IMarshlableObject interface and the message will be delivered to the produer along with the attachment object as submitted.
  • ItemAddedEvent – This event is generated by the writers when a number of records are inserted in the record pool. Based on this event the reader can evaluate when there are more records for him to consume.
  • ItemRequestedEvent – This event is generated by the readers to request a number of records to be produced. Based on this the writer can evaluate when he should produce more records to be consumed.
  • ProductionCompletedEvent – This event is generated by the writers when they finish authoring the record pool. It can be used by the readers to identify whether or not they can continue requesting more records than the ones already available to them or not. This event does not imply that the producer is disposed. The Writer instance is still available to respond to control signals as they may be needed. The producer can still use the record pool to exchange messages with the consumer.

[[Image:]]

  • ConsumptionCompletedEvent – This event is generated by the readers when they have finished consuming the record pool. It can be used by the consumers to stop producing more records and stop authoring the record pool. This event does not imply that the consumer is disposed. The Reader can still be used to access records already made available to it, and the records that have already been accessed can be used. The producer and consumer can still exchange messages.
  • ItemTouchedEvent – This event is emitted whenever a record is accessed in some way. Whether it is modified, transfered, accessed, some of its payload moved to the consumer from the producer, etc. It is used by the pool life cycle management for pool housekeeping as explained in later section.
  • ItemNotNeededEvent – This event is emitted by a consumer whenever he wants to declare that he is done accessing the record and all its payload and that the record is no longer needed. It is used by the pool life cycle management for pool housekeeping as explained in later section. After this event is emitted for some record, the specific record is candidate for disposal at any time the framework sees fitting.
  • FieldPayloadRequestedEvent – The framework supports multiple transfer functionalities and parameterizations. In any but the simple cases of collocated producers and consumers and full payload being transferred immediately across the wire to the consumer, the payload of a record's field may need to become gradually available to the consumer through partial transfer. For a payload part to be transferred, the consumer will need to request explicitly through a call the additional payload or implicitly through the use of one of the frameworks mediation utilities. This request is done through this event which requests for a specific record, a specific field of the record, a number of bytes to be made available. There are two ways that this event can be handled by the framework. The first case is when the producer and consumer are connected through a LocalProxy. In this case both actors are collocated and the payload of every field is already available to the consumer. In this case each writer must be able to catch these events and immediately respond with a FieldNoMorePayloadEvent to indicate to the consumer that all the available payload for the field is already available for him to consume. The other case is to have some other type of mediation between the two actors where there must be some kind of transport between the two actors. In this case the protocol connecting the producer side of the proxy must catch the event, retrieve the needed payload from the respective field and respond with the appropriate event, FieldPayloadAddedEvent or FieldNoMorePayloadEvent depending on whether more payload has been made available or no more payload can be made available as all the payload is already transferred,
  • FieldPayloadAddedEvent – This event is send as a response to a FieldPayloadRequestedEvent when more data has been made available to the consumer as per request as is sketched above.
  • FieldNoMorePayloadEvent – This event is send as a response to a FieldPayloadRequestedEvent when no more data can be send to the consumer as per request because all the available data is already there, as is sketched above.
  • LocalPoolItemAddedEvent – This event indicate when a new item has been added to the local pool. Local to the component that catches the event. For example a reader catching the event will interpret it as new item created by the writer, and made available to the reader side as a response to an ItemRequestedEvent that he send. This event is used by the reader and writer classes to update their buffer policies.
  • LocalPoolItemRemovedEvent – This event indicate when an item has been retrieved from the local pool. Local to the component that catches the event. For example a writer catching the event will interpret it as an item requested previously by the reader and created and added to the pool by the writer has been transferred to the reader who consumed it. This event is used by the reader and writer classes to update their buffer policies.
  • ItemHibernateEvent – A record during its lifetime can be considered as needed to be kept for later use by a consumer but for the time being it is not needed and should be somehow hidden to free up resources. For this purpose any record and its included fields which is already contained in a record pool can be hibernated. The record is persisted to disk and can be subsequently waken up in its initial state. When a record is marked to be hibernated this event is emitted and both the pools that may contain it, producer and consumer pool if the pools are remote and synchronized, will catch the event and hibernate the record. This event processing may be asynchronous.
  • ItemAwakeEvent – At some point a hibernated record may be needed again. At this point both the record in the consumer and in the producer side must be awaken and restored to the exact state that is was before the hibernation. This event is emitted and both pools will resurrect the record and its fields from the disk buffer.
  • DisposePoolEvent – This event is emitted when the record pool and all its records are no longer needed. The dispose method is called either because the producer decided that he wants to revoke the produced record pool or because the reader decided that he is done consuming the record pool. Since the communication is point to point, once the consumer stops consuming, the record pool is fully disposed. The record pool disposal disposes each record, and if the record is hibernated it wakes it up before it is disposed to make sure each field performs its cleanup. Once this event is emitted all state associated with the record pool is cleaned up whether it is data within the record pool it self or registry entries, open sockets, etc