GDL Projections (2.0)

From Gcube Wiki
Jump to: navigation, search

A projection is a set of constraints over the properties of document descriptions. It can be be used in the read operations of the gDL to:

  • characterise relevant descriptions as those that match the constraints (projections as types);
  • specify what properties of relevant descriptions should be retrieved (projections as retrieval directives).

Constraints take accordingly two forms:

  • include constraints apply to properties that must be matched and retrieved;
  • filter constraints apply to properties that must be matched but not retrieved.

In both cases, the constraints take the form of predicates of the Content Manager Library (CML). The projection itself converts into a complex predicate which is amenable for processing by the Content Manager service in the execution of its retrieval operations. In this sense, projections are a key part of the document-oriented layer that the gDL defines over lower-level components of the gCube subsystem dedicated to content management.

As a first example, a projection may specify an include constraint over the name of metadata elements and a filter constraint over the time of last update. It may then be used to:

  • characterise document descriptions with at least one metadata element that matches both constraints;
  • retrieve of those descriptions only the name of matching metadata elements, excluding the time of last update, any other metadata property, and any other document property, include other inner elements and their properties.

Projections have the Projection interface, which can be used to access their constraints in element-generic computations. To build projections, however, clients deal with one of the following implementation of the interface:

  • DocumentProjection
  • MetadataProjection
  • AnnotationProjection
  • PartProjection
  • AlternativeProjection

A further implementation of the interface:

  • PropertyProjection

allows clients to express constraints on the generic properties of documents and their inner elements.

Empty Projections

Clients create projections with the factory methods of the Projections companion class. A static import improves legibility and is recommended:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;...
DocumentProjection dp = document();
 
MetadataProjection mp = metadata();
 
AnnotationProjection annp = annotation();
 
PartProjection pp = part();
 
AlternativeProjection altp = alternative();

The projections above do not specify any include constraint or filter constraints on the elements of the corresponding type. For example, dp matches all document descriptions, regardless of their properties, inner elements, and properties of their inner elements. Similarly, mp matches all metadata elements of any document description, regardless of their properties, and pp matches all the parts of any document description, regardless of their properties. In this sense, the factory methods of the Projections class return empty projections.

Include Constraints

Clients may add include constraints to a projection with the method with(). For document projections, for example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME);

With the above, the client adds the simplest form of constraint, an existence constraint that requires matching document descriptions to have given properties, here only a name. Since this is an include constraint, the client is expressing an interest only in this property, regardless of the existence and values of other properties. Used as a parameter in the read operations of the gDL, this projection is interpreted into a directive to retrieve only the names of document descriptions that have one. To reiterate this important point: any other descriptive property will not be retrieved. Thus, with() allows clients to specify precisely what they need to work with, no more and no less.

note: properties are conveniently represented by constants in the Projections class. The constants are not strings, however, but dedicated Property objects specific to the type of projection. Trying to use properties that are undefined for the type of elements targeted by the projection is illegal and the error is detected statically.

Note that existence constraints may be expressed at once on multiple properties, e.g.:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME,LANGUAGE,BYTESTREAM);

Filter Constraints

Along with inclusion constraints, clients may specify filter constraints with the method where(). Projections classes follow a builder pattern, i.e. their methods to be chained for increased readability. In particular, e.g.:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME,LANGUAGE)
                                  .where(BYTESTREAM);

As in the previous example, the client requires document descriptions to have a name, a language, and to embed a bytestream. Used as a parameter in the read operations of the gDL, however, the projection is interpreted into a different retrieval directive: of the matching descriptions, retrieve only the name and the language, not their bytestream. Thus with() allows clients to specify what they need to be true but do not need to work with.

note: As for with(), where() accepts multiple properties as parameters.

note: Constraining the same property in with() and where() parameter lists, or else across methods, has a destructive effect: the constraint specified last overrides those specified earlier on the same property. This allows clients to stage the construction of a projection across multiple components, where a component may wish to override what the constraints set by an upstream component. Clients should be careful to avoid this repetition in all the other scenarios.

Filter constraints are typically used in combination with include constraints, as in the example above. However, a projection may include only filter constraints, e.g.:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().where(NAME);

The projection requires document descriptions to have a name but it also indicates that none of their properties should be actually retrieved. Projections of this kind can be used to verify the existence of matching descriptions, or else to count the number of matching descriptions, whilst moving the minimum amount of data over the network.

Optional Modifiers

Another common requirement is to indicate the optionality of constraints. Clients may wish to retrieve certain properties only if they satisfy given constraints. In this case, clients can use the opt() method of the Projections class as a constraint modifier. Consider this variation on a previous example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME,opt(LANGUAGE))
                                  .where(BYTESTREAM);

This projection differs from the previous one only for the optional modifier on (the existence of) a language. Used as a parameter in the read operations of the gDL, this projection retrieves the name all document descriptions that include a bytestream, but also their language if they do have one. If they do not have a language, only the name will be retrieved. In other words, name and bytestream are conditions that descriptions must match to be relevant, the language is instead only optional.

A common use of optional modifier is with bytestream, which clients may wish either to find included in the document description or else referred to from within it:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(opt(BYTESTREAM),opt(BYTESTREAM_URI));

Used as a parameter in the read operations of the gDL, this projection retrieves at most the bytestream and its URI for those document descriptions that have both, only one of the two if the other is missing, and nothing at all if they are both missing.

note: Using optional modifiers in filter constraints, i.e. as arguments to the where() method, is nonsensical since an optional constraint does not discriminate any document description. Worse, optional filters can slow down the execution of retrieval as the service back-end may not be able to optimise them away.

Catch-All Constraints

Clients may combine include constraints, filter constraints, and optional modifiers to build any projection that can be possibly built with the gDL. With these constructs, they can pinpoint exactly what properties are to be retrieved and when they should be retrieved. This accuracy is a main goal of the gDL, but it may be inconvenient when clients wish to express existence constraints on a number of properties at once. Some common projections, in particular, cannot be conveniently built with these constructs alone.

For example, clients may wish to constrain only a few properties but retrieve them all. Consider the following example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME, opt(LANGUAGE), opt(BYTESTREAM), opt(METADATA), ...);

Here, the client requires document descriptions to have a name but wishes to retrieve any other property that they may have. To express this, the client must explicitly add optional existence constraints on all these properties. Clearly, this is cumbersome and will break if the model is extended in the future.

To improve matters, clients may use the method etc(), which adds such existence constraints automatically:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME).etc();

Similarly, clients may wish to add catch-all existence constraints on all properties but for a few ones, which they do not wish to retrieve. For this, they can use the method allexcept():

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(NAME).allexcept(BYTESTREAM,PART);

Here, bytestreams and parts are excluded from retrieval, if they exist.

Note that explicit with() and where() constraints have precedence over those automatically generated by etc() and allexcept(), regardless of the order of method invocation. The following example illustrates the point:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp1 = document().with(NAME).etc().where(LANGUAGE);
DocumentProjection dp2 = document().etc().with(NAME).where(LANGUAGE);
DocumentProjection dp3 = document().with(NAME).where(LANGUAGE).etc();
 
assert(dp1.equals(dp2));
assert(dp2.equals(dp3));

A similar example could be repeated for exceptall().

On the other hand, etc() and exceptAll() are intended as mutually exclusive alternatives and should not be used together in a projection. Doing so may produce undesired effects:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
//silly projection: allexcept() has no effect
DocumentProjection dp1 = document().etc().allexcept(NAME);  
 
assert(dp1.equals(document().etc()));
 
 
//silly projection: etc() reintroduces name...
DocumentProjection dp2 = document().allexcept(NAME).etc();
 
assert(dp2.equals(document().etc()));

Finally, note that document() and documen().etc() are different projections even if they have the same implications for retrieval:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
//silly projection: allexcept() has no effect
DocumentProjection dp1 = document();  
DocumentProjection dp2 = document().etc();
 
assert(!dp1.equals(dp1));

The difference is in fact substantial: document() adds no constraints on properties, while the document().etc() adds an optional constraint on each and every property. Clients should prefer empty projections in all cases, as they travel faster over the network and are more likely to be executed faster by remote content manager services.

Deep Projections

In the examples above, we have considered existence constraints on simple element properties. The examples generalise easily to repeated structured properties, such as generic properties for all elements and inner element properties for document descriptions.

Consider the following example:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
DocumentProjection dp = document().with(PART, opt(METADATA), PROPERTY);

Here the client adds three include constraints to the projection, all three for the existence of repeated properties. Document descriptions that match this projection have at least one part, at least one generic property, and zero or more metadata elements. Used as a parameter in the read operations of the gDL, this projection retrieves all the parts and all the generic properties of descriptions that have at least one of each, as well as all of their the metadata elements if they happen to have some.

Repeated properties such as generic properties and inner elements are also structured, i.e. have properties of their own. Clients that wish to constrain those properties too can use deep projections, i.e. embed within the projection of a given type one or more projections built for the structured properties of elements of that type. The following example illustrates the concept for metadata elements:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
MetadataProjection mp = metadata().with(LANGUAGE).where(BYTESTREAM);
 
DocumentProjection dp = document().with(NAME, PART)
                                  .with(METADATA,mp);

The first projection constraints the existence of language and bytestream for metadata elements. The second projection constraints the existence of name and parts for document descriptions, as well as the existence of metadata elements that match the constraints of the first projection. The usual implications of include constraints and filter constraints apply. Used as a parameter in the read operations of the gDL, this projection retrieves the name, parts, and metadata elements of document descriptions that have a name, at least one part, and at least one metadata element that includes a bystream. For the metadata elements, in particular, it retrieves only the language property.

Note that optionality constraints apply to deep projections as well as they apply to flat projections, as the following example shows:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
MetadataProjection mp = metadata().with(LANGUAGE).where(BYTESTREAM);
 
DocumentProjection dp = document().with(NAME, PART)
                                  .with(opt(METADATA,mp));

This projection differs from the previous one only because the existence of on metadata elements that match the inner projection is optional. Document descriptions that have a name and at least one part match the outer projection even if the have no metadata elements that match the inner projection (or no metadata elements at all).

Projecting over Generic Properties

Generic properties are repeated and structured properties common to all elements. As for other properties with these characteristics, clients may wish to build deep projections that constraints their inner properties. For this purpose, the class Projections includes a dedicated factory method property(), as well as as specialised methods to express constraints. The following example illustrates the approach:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
...
 
PropertyProjection pp = property().withKey("somekey").with(PROPERTY_TYPE);
 
DocumentProjection dp = document().with(NAME, PART)
                                  .with(PROPERTY,pp);

Here, the client creates a document projection and embeds in it an inner projection that constrains its generic properties. The inner projection uses the method with() to add an include constraint for the existence of a type for the generic property, as usual. It also adds an include constraint to specify an exact value for the key of a generic property of interest. This relies on a method withKey() which is specific to projection over generic properties of elements. The reason for this specific construct is that, differently from other constrainable properties of elements, they key of a generic property serves as its identifier.

For the rest, property projections behave like other projections (e.g. can be used with optional modifiers). Used as a parameter in the read operations of the gDL, the projection above matches document descriptions with a name, at least one part, and a property with key somekey and some type.

Equivalence Constraints

In all the examples above, we have relied on simple existence constraints to present the mechanisms available for building projections. Moving beyond the existence of properties, another common type of constraint is based on text equivalences over simple element properties, which form the majority of properties of document descriptions and inner elements. The gDL offer a dedicated mechanism to specify these type of constraints in the form of the whereValue() method of projection classes. The following example illustrates usage:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
import static java.util.Locale.*;
...
DocumentProjection p = document().with(NAME).
                                 whereValue(LANGUAGE,ENGLISH).
                                 whereValue(BYTESTREAM_URI,new URI("...."));

Here the client asks for the language of document descriptions only if this matches the ISO639 code for the Italian language and only if the descriptions have a bytestream with a given URI. As the example shows, whereValue() accept arbitrary objects and use their toString() serialisations for the equivalence (the serialisations are in fact refined in special cases such as dates, so that clients should alway pass objects rather than invoke toString() upon them).

note: besides whereValue() clients may use withValue(). While it seems strange to ask for the retrieval of a known value, there are cases when the returned descriptions need to flow into contexts in which this knowledge is not available. In these cases, withValue() avoids the complication of staging these descriptions after-retrieval. As this comes at the expense of bandwidth and increased latencies, use with awareness of the implications.

Complex Constraints

In more advanced forms of projections, clients may wish to specify constraints on properties other than existence and equivalence. For these cases, gDL offers overloads of with() and where() that take as parameters Predicates that capture the desired constraints. As mentioned above, predicates are defined in the CML and gDL clients need to become acquainted with the range of available predicates and how to build them.

As an example of the possibilities, consider the following:

import static org.gcube.contentmanagement.gcubedocumentlibrary.projections.Projections.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*;
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*;
...
Calendar from = ...
Calendar to = ....
DocumentProjection p = document().with(BYTESTREAM_URI,uri(matches("^ftp.*")));
                                 .where(CREATION_TIME,date(all(after(from),before(to))));

This projection is matched by document descriptions that have been created at some point in between two dates, and with a bytestream available at some ftp server. Used as a parameter in the read operations of the gDL, the projection would retrieve only the URI of (the bytestream of) matching descriptions. As documented in the CML, the client builds the predicate with the static methods of the Predicates and Constraints classes, which he previously imports.

note: in building predicate expressions with the API of the CML, clients take responsibility for associating properties with predicates that are compatible with their type. In the example above, the creation time of an element is a temporal property and thus only date()-based predicates can successfully match it. The gDL relinquishes the ability to ensure the correct construction of projections so as to allow clients to use the full expressiveness of the predicate language of the CML.

note: Deep projections and overloads for equivalence constraints equivalence already make use of this customisability. When clients embed a projection into another, they constrain the corresponding structured property with the predicate into which the inner projection translates. Similarly, when they specify equivalence constraints they are implicitly using predicates of the form text(is(...)).