Difference between revisions of "GCube Document Library (2.0)"
(→Preliminaries) |
|||
Line 10: | Line 10: | ||
= Preliminaries = | = Preliminaries = | ||
+ | |||
+ | The core functionality of the gDL lies in its operations to read and write documents. The operations trigger interactions with remote services and the movement of potentially large volumes of data across the infrastructure. This may have a non-trivial and combined impact on the responsiveness of clients and the overall load of the infrastructure. The operations have been designed to minimise this impact. In particular: | ||
+ | |||
+ | * when reading, clients can qualify the documents that are relevant to their queries, and indeed what properties of relevant documents should be actually retrieved. These retrieval directives are captured in the gDL by the notion of '''document projections'''. | ||
+ | |||
+ | * when reading and writing, clients can move large numbers of documents across the infrastructure. The gDL ''streams'' this I/O movements so as to make efficient use of local and remote resources. It then defines a facilities with which clients can conveniently consume input streams, produce output streams, and more generally filter one stream into an other regardless of their origin. The facilities are collected into the '''stream DSL''', an embedded domain-specific language for stream processing. | ||
+ | |||
+ | Understanding document projections and the stream DSL is key to reading and writing documents effectively. We discuss these preliminary concepts first, and then consider their use as input and outputs of the operations of the gDL. | ||
== Projections == | == Projections == |
Revision as of 14:10, 9 February 2011
The gCube Document Library (gDL) is a client library for storing, updating, deleting and retrieving document description in a gCube infrastructure.
The gDL is a high-level component of the subsystem of gCube Information Services and it interacts with lower-level components of the subsystem to support document management processes within the infrastructure:
- the gCube Document Model (gDM) defines the basic notion of document and the gCube Model Library (gML) implements that notion into objects;
- the objects of the gML can be exchanged in the infrastructure as edge-labelled trees, and the Content Manager Library (CML) can model such trees as objects and dispatch them to the read and write operations of the Content Manager (CM) service;
- the CM implements these operations by translating trees to and from the content models of diverse repository back-ends.
The gDL builds on the gML and the CML to implement a local interface of CRUD
operations that lift those of the CM to the domain of documents, efficiently and effectively.
Preliminaries
The core functionality of the gDL lies in its operations to read and write documents. The operations trigger interactions with remote services and the movement of potentially large volumes of data across the infrastructure. This may have a non-trivial and combined impact on the responsiveness of clients and the overall load of the infrastructure. The operations have been designed to minimise this impact. In particular:
- when reading, clients can qualify the documents that are relevant to their queries, and indeed what properties of relevant documents should be actually retrieved. These retrieval directives are captured in the gDL by the notion of document projections.
- when reading and writing, clients can move large numbers of documents across the infrastructure. The gDL streams this I/O movements so as to make efficient use of local and remote resources. It then defines a facilities with which clients can conveniently consume input streams, produce output streams, and more generally filter one stream into an other regardless of their origin. The facilities are collected into the stream DSL, an embedded domain-specific language for stream processing.
Understanding document projections and the stream DSL is key to reading and writing documents effectively. We discuss these preliminary concepts first, and then consider their use as input and outputs of the operations of the gDL.