Difference between revisions of "GCube Document Library (2.0)"
(→Projections) |
(→Projections) |
||
Line 33: | Line 33: | ||
* '''filter constraints''' apply to properties that must be matched but ''not'' retrieved. | * '''filter constraints''' apply to properties that must be matched but ''not'' retrieved. | ||
− | As a simple example of the implications, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update. It may then be used to: | + | ''Note'': in both cases, the constraints take the form of 'predicates' of the [[Content_Manager_Library|Content Manager Library] (CML)]. The projection itself converts into a complex predicate which is amenable for processing by the Content Manager service in the execution of retrieval operations. In this sense, projections are a key part of the document-oriented layer that the gDL defines over lower-level components of the gCube Content Management architecture. |
+ | |||
+ | As a simple example of the implications, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update. | ||
+ | <br> | ||
+ | It may then be used to: | ||
* characterise documents with metadata elements that match both constraints; | * characterise documents with metadata elements that match both constraints; | ||
− | * retrieve of those documents only the name of matching metadata elements, excluding any other document property, including inner elements and their properties. | + | * retrieve of those documents only the name of matching metadata elements, excluding any other document property, including other inner elements and their properties. |
+ | |||
+ | All projections in the gDL have the <code>Projection</code> interface, which can be used in element-generic computations to access their constraints. To build projections, however, clients deal with one of the following implementation of the interface: | ||
+ | |||
+ | * <code>DocumentProjection</code>; | ||
+ | * <code>MetadataProjection</code>; | ||
+ | * <code>AnnotationProjection</code>; | ||
+ | * <code>PartProjection</code>; | ||
+ | * <code>AlternativeProjection</code>. | ||
+ | |||
+ | A further implementation of the interface: | ||
+ | |||
+ | * <code>PropertyProjection</code> | ||
+ | |||
+ | allows clients to express constraints on the generic properties of any of the elements of the gDM. | ||
=== Simple Projections === | === Simple Projections === | ||
+ | |||
+ | Clients create projections with the factory methods of the <code>Projections</code> companion class (a static import improves legibility and is recommended): | ||
+ | |||
+ | <source lang="java5"> | ||
+ | import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*; | ||
+ | ... | ||
+ | DocumentProjection dp = document(); | ||
+ | MetadataProjection mp = metadata(); | ||
+ | AnnotationProjection annp = annotation(); | ||
+ | PartProjection pp = part(); | ||
+ | AlternativeProjection altp = alteranative(); | ||
+ | </source> | ||
+ | |||
+ | The projections above do not specify any constraints on the elements of the corresponding type. For example, <code>dp</code> matches all documents, regardless of their properties, inner elements, and properties of their inner elements. Similarly, <code>mp</code> matches all metadata elements, regardless of their properties. Yet this ''empty projections'' have different implications in terms of retrieval. <code>dp</code> will retrieve all the document(s) that are targeted by queries, whereas <code>mp</code> will retrieve al documents that have ''at least'' a metadata element. | ||
=== Advanced Projections === | === Advanced Projections === |
Revision as of 23:30, 9 February 2011
The gCube Document Library (gDL) is a client library for storing, updating, deleting and retrieving document description in a gCube infrastructure.
The gDL is a high-level component of the subsystem of gCube Information Services and it interacts with lower-level components of the subsystem to support document management processes within the infrastructure:
- the gCube Document Model (gDM) defines the basic notion of document and the gCube Model Library (gML) implements that notion into objects;
- the objects of the gML can be exchanged in the infrastructure as edge-labelled trees, and the Content Manager Library (CML) can model such trees as objects and dispatch them to the read and write operations of the Content Manager (CM) service;
- the CM implements these operations by translating trees to and from the content models of diverse repository back-ends.
The gDL builds on the gML and the CML to implement a local interface of CRUD
operations that lift those of the CM to the domain of documents, efficiently and effectively.
Preliminaries
The core functionality of the gDL lies in its operations to read and write documents. The operations trigger interactions with remote services and the movement of potentially large volumes of data across the infrastructure. This may have a non-trivial and combined impact on the responsiveness of clients and the overall load of the infrastructure. The operations have been designed to minimise this impact. In particular:
- when reading, clients can qualify the documents that are relevant to their queries, and indeed what properties of relevant documents should be actually retrieved. These retrieval directives are captured in the gDL by the notion of document projections.
- when reading and writing, clients can move large numbers of documents across the infrastructure. The gDL streams this I/O movements so as to make efficient use of local and remote resources. It then defines a facilities with which clients can conveniently consume input streams, produce output streams, and more generally filter one stream into an other regardless of their origin. The facilities are collected into the stream DSL, an embedded domain-specific language for stream processing.
Understanding document projections and the stream DSL is key to reading and writing documents effectively. We discuss these preliminary concepts first, and then consider their use as input and outputs of the operations of the gDL.
Projections
A projection is a set of constraints over the properties of documents in the gDM. It can be used to match documents, i.e. identify documents whose properties satisfy the constraints of the projection.
Projections and matching are used in the read operations of the gDL:
- as a means to characterise relevant documents (projections as types);
- as a means to specify what parts of relevant documents should be retrieved (projections as retrieval directives).
The constraints of a projection take accordingly two forms:
- include constraints apply to properties that must be matched and retrieved;
- filter constraints apply to properties that must be matched but not retrieved.
Note: in both cases, the constraints take the form of 'predicates' of the [[Content_Manager_Library|Content Manager Library] (CML)]. The projection itself converts into a complex predicate which is amenable for processing by the Content Manager service in the execution of retrieval operations. In this sense, projections are a key part of the document-oriented layer that the gDL defines over lower-level components of the gCube Content Management architecture.
As a simple example of the implications, a projection may define an include constraint over the name of metadata elements and a filter constraint over the time of their last update.
It may then be used to:
- characterise documents with metadata elements that match both constraints;
- retrieve of those documents only the name of matching metadata elements, excluding any other document property, including other inner elements and their properties.
All projections in the gDL have the Projection
interface, which can be used in element-generic computations to access their constraints. To build projections, however, clients deal with one of the following implementation of the interface:
-
DocumentProjection
; -
MetadataProjection
; -
AnnotationProjection
; -
PartProjection
; -
AlternativeProjection
.
A further implementation of the interface:
-
PropertyProjection
allows clients to express constraints on the generic properties of any of the elements of the gDM.
Simple Projections
Clients create projections with the factory methods of the Projections
companion class (a static import improves legibility and is recommended):
import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*; ... DocumentProjection dp = document(); MetadataProjection mp = metadata(); AnnotationProjection annp = annotation(); PartProjection pp = part(); AlternativeProjection altp = alteranative();
The projections above do not specify any constraints on the elements of the corresponding type. For example, dp
matches all documents, regardless of their properties, inner elements, and properties of their inner elements. Similarly, mp
matches all metadata elements, regardless of their properties. Yet this empty projections have different implications in terms of retrieval. dp
will retrieve all the document(s) that are targeted by queries, whereas mp
will retrieve al documents that have at least a metadata element.