Content Manager
The Content Manager service provides its clients with uniform access to content served by a variety of back-ends, both inside and outside the system. It is the central component of the gCube subsystem that deals with the organisation of content and related data.
Design Overview
The Content Manager is designed as an OCMA service. In OCMA terms, it classifies as a multi-type, 1-N adapter service:
- it is a multi-type service because it supports two front types for, respectively, reading and writing content modelled as labelled trees.
- Collectively, the front types and the tree content model form the
gDoc
access type of the service.
- it is an adapter service because it adapts the
gDoc
access type to multiple back types, where each back type corresponds to the access type of a whole class of remote repositories.
- For this, the service employes an open architecture of type-specific plugins to which it delegates the creation and operation of its collection managers.
- Plugins are dynamically deployed within single instances of the services, and different instances may host different plugins. In addition, some plugins may support both service front types, i.e. grant read and write access to the corresponding repository. Others may instead support read-only access or, less commonly, write-only access.
The figure below overviews the design and use of the service in the context of one its running instances. The instance exposes three stateful port-types:
- the
ReadManager
serves as the interface of collection managers that offer read-only operations over the content of the bound collection.
- The interface defines the
gDocRead
front type of the service. - The front type and the identifier of the bound collection are published as Resource Properties of the manager, in accordance with OCMA patterns for publication and discovery of service state. A third Resource Property is the name of the bound plugin, i.e. the plugin to which the manager delegates the resolution of its requests.
- the
WriteManager
serves as the interface of collection managers that offer write-only operations over the content of the bound collection.
- The interface defines the
gDocWrite
front type of the service. - Again, the type, the identifier of the bound collection, and the name of the bound plugin are published as Resource Properties of the manager.
- the
Factory
serves as the front-end of a single WS Resource that createsReadManager
andWriteManager
resources .
- The resource is created at the activation of the service instance in the gCube Hosting Node.
- During its lifetime, it publishes creation requests as activation records. Conversely, it subscribes for the activation records that are published by other instances of the service, in line with OCMA patterns for replication of service state.
- The resource also publishes as a Resource Property the list of summary descriptions of the plugins that are hosted at the service instance.
Service plugins logically extend factory and collection manager resources with corresponding resource delegates. In particular:
- the factory delegate extends the
Factory
resource at plugin deployment time in order to handle requests that are specifically addressed to the plugin; - at each such request, the factory delegate processes plugin-specific parameters to create one ore more read delegates and/or write delegates, which the service instance uses to create and extend corresponding collection managers;
- future requests to the managers are then handled by their delegates, which translate the requests against the back-end repository that exposes the collection bound to the managers.
Finally, note that factory and collection managers are persistent resources and may thus be re-activated across restarts of the gCube Hosting Node:
- the factory persists the history of its activations, i.e. the activation records that it published and/or processed.
- the collection managers persist the name and state of their delegates.
Content Model
Architectural considerations aside, the most distinguished element in the design of the Content Manager is its content model. Rather than settle for a fixed set of document structures, the service adopts a generic structure that can act as a 'carrier' for an arbitrary number of concrete document models. In particular, the service deals with edge-labelled and node-attributed trees, the gDoc
trees.
The expectation here is that producers (service plugins) and consumers (service clients) will convene on concrete document models and exchange gDoc
trees with an agreed shape. The agreement may be bilateral or involve any number of parties, and it may apply to the entire document or to distinguished parts of it (e.g. document metadata, annotations, raw content packaging, etc). For maximum decoupling between consumers and producers, the agreement may reflect system-wide conventions and result in canonical tree forms.
gDoc Trees
A gDoc
tree has the following properties:
- its nodes may have an identifier and a number of uniquely named attributes;
- its edges have a label;
- its leaf nodes may have a value;
- its root may identify the collection of the corresponding document.
In particular:
- identifiers, attributes, and leaves have text values;
- attribute names and labels may be qualified with a namespace.
The figure below uses a graphical representation to show an example of a gDoc
tree.
gDoc
trees serialise to XML documents for exchange over the network. In particular:
- nodes serialise to elements and attributes serialise to element attributes
- elements are named like the edges that enter the corresponding nodes
- the document element is arbitrarily named
- the elements of inner nodes contain the elements of their children
- the elements of leaves contain their value
- node identifiers serialise to attributes called
http://gcube-system.org/namespaces/contentmanagement/gdoc:id
- node states serialise to attributes called
http://gcube-system.org/namespaces/contentmanagement/gdoc:state
- collection identifiers serialise to attributes called
http://gcube-system.org/namespaces/contentmanagement/gdoc:collID
For example, the gDoc
tree above may serialise as:
<g:gdoc xmlns:g="http://gcube-system.org/namespaces/contentmanagement/gdoc" g:id="1" x="..." y="..." g:collID="..."> <a g:id="2"> <b g:id="5"/> </a> <a g:id="3"> <c> <d >...</d> <d w="..">...</d> </c> </a> <b g:id="4" w="..." /> </g:gdoc>
Note that gDoc
trees inherit constraints from their XML serialisation. In particular, the names of edges, the names of attributes, the values of attributes, and the values of leaves are regulated by the definition of the format.
gDoc API
The XML serialisation of gDoc
trees is 'natural', in that it does not employ dedicated element structures for the representations nodes, edges, attributes, etc. This streamlines its manipulation with standard XMl technologies (e.g. XPath, XSLT, XQuery, DOM, SAX, etc.) and does not inhibit object binding technologies (e.g. JAXB, XStream, etc). As a native option, however, the service defines a bespoke object model and API for gDoc
trees which offer:
- dedicated support for tree processing requirements associated with the use of the service;
- transparencies and optimisations for tree storage, construction, deconstruction, and input/output.
While the model is available to service clients, it also forms the basis of the interface between the service and its plugins. For this reason, its main features are overviewed here while its client-oriented features are discussed later on.
As the figure below illustrates, the model is defined in org.gcube.contentmanagement.contentmanager.stubs.model.trees
in terms of the following components:
-
Node
: an abstract base for nodes with an identifier, a state, and a map ofQName
-ed attributes. -
State
: an inner enumeration ofNode
for node states. -
Edge
: AQName
-ed edge to a targetNode
. -
InnerNode
: aNode
with a list of outgoingEdge
s. -
Leaf
: aNode
with a value. -
gDoc
: anInnerNode
with a collection identifier. -
Nodes
: a collection of static utilities to generateNode
s andEdge
s. -
Bindings
: a collection of static utilities to serialise and deserialiseNodes
s to and from DOM trees and/or character streams. -
NodeView
: a base class for JAXB bindings toNode
s. -
GDocView
: aNodeViewM
for JAXB bindings toGDoc
nodes.
The model API is illustrated by example in the rest of this Section. The full list of methods and their signatures can be found in the code documentation.
Building Trees
The first and obvious way to create gDoc
trees is with the constructors of the concrete node classes (GDoc
, InnerNode
, Leaf
).
As a first example, the following code illustrates the creation of a tree with an attributed root and two leaf nodes:
GDoc doc = new GDoc("someid"); doc.setAttribute(new QName("x"), "1"); doc.setAttribute(new QName("someNS","y"), new Date().toString()); doc.collectionID("..."); Leaf leaf1 = new Leaf(null,"2"); //no identifier Leaf leaf2 = new Leaf(null,"true"); Edge e1 = new Edge(new QName("a"),leaf1); Edge e2 = new Edge(new QName("someNS","b"),leaf2); doc.add(e1,e2);
While already more convenient than cross-language and format-oriented tree APIs (e.g. DOM), step-by-step construction is verbose, even in the case of small trees.
For a first degree of improvement, the node classes offer rich suites of constructors and setter overloads that allow for more 'in-lined' tree constructions and absorb the creation of QName
s:
GDoc doc2 = new GDoc("someid", new Edge("a", new Leaf(null,"2")), new Edge("someNS","b", new Leaf(null,"true"))); doc2.setAttribute("x", "1"); doc2.setAttribute("someNS","y", new Date().toString()); doc2.collectionID("somecollID");
For additional convenience, the Nodes
class defines a large number of generators, i.e. factory methods that can be statically imported and then composed into a pseudo literal syntax for gDoc
trees:
import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Nodes.*; ... GDoc doc3 = attr( gdoc("somecollID","someid", e("a",2), e("someNS","b",true)), a("x",1),a("someNS","y",new Date()));>
Here, gdoc
, attr
, e
, a
are examples of node, attribute, and edge generators. Besides allowing fully in-lined tree expressions without the use of the new
operator, the generators offer QName
creation transparencies and object-to-string conversion transparencies (cf. the int
, boolean
, and Date
example above). The transparency of date conversions is particularly important here, as it ensures adherence to XML serialisation standards that are not natively adopted in Java (e.g. in the implementation of toString
). See the code documentation for the full list of available generators, as well as for the additional examples that follow:
doc = gdoc(); doc = gdoc("someid"); doc = gdoc("collectionid","someid"); doc = gdoc("1", e("a", n("2", //n() => inner node generator e("b",l("3",0))))); doc = gdoc( //no identifier e("a", attr( n("2", e("b",l("3",0)), //l()= explicit leaf generator for identity assignment e("a",l("4",0))), a("foo","0")))); doc = attr(gdoc("1", e("a",l("2",5)), e("b",attr( n("3",e("c",4)), a("foo",0))), e("c",5)), a("x",0)); doc = attr(gdoc("1", e("a", n("2", e("b",n("$2")))), e("a",n("a1", e("c",n( e("d","..."), e("d",attr( //l()= explicit leaf generator for attribute assignments l("<xml>..</xml>"), a("w",".."))))))), e("b",attr( n("1:/2"), a("w","...")))), a("x","http://org.acme:8080"),a("y","<a>...</a>"));
The literal construction of trees is particularly convenient in during testing, though it composes well with the programmatic construction in the development of production code:
Edge edge = .... InnerNode node = ....; attr( node.add(e("before","..."), edge, e("after","...")) ), a("newattr","...");
note: the node classes override equals
for equivalence-based comparisons, and hashCode
for correct use as keys within hash-based data structures, and toString
for convenience of debugging.
Serialising and Deserialising Trees
The Bindings
class offers static facilities to transform native models of gDoc
trees into XML-based models. Two representations are supported natively, based on which other XML-based representation can be produced using standard platform facilities (e.g. TRAX):
-
Bindings.toElement(GDoc)
converts native models ofgDoc
trees into equivalent DOM models. -
Bindings.fromElement(Element)
converts DOM models ofgDoc
trees into equivalent native models. -
Bindings.toXML(GDoc, Writer, boolean?)
converts the native model into XML document streams, optionally excluding document declarations. -
Bindings.fromXML(Reader)
converts XML document streams intogDoc
trees.
note: DOM conversions of native models are implemented directly, as they are most commonly required for interactions with the Content Manager service. Stream conversions are instead derived from DOM conversions via TRAX, at an additional processing cost.
note: conversions from native models to XML-based models assign the conventional name http://gcube-system.org/namespaces/contentmanagement/gdoc:gdoc
(cf. Bindings.GDOC_NS
, and Bindings.GDOC_NAME
constants) to the document element. Vice versa, conversion from XML-based representations to native models discard the name of the document element.
Here is a usage example, which shows that equivalence of native models is preserved under round-trip conversions.
import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Bindings.*; ... GDoc doc = .... //DOM conversion GDOc doc2 = fromElement(toElement(doc)); assert doc.equals(doc2); //true! //stream conversion StringWriter w = new StringWriter(); toXML(doc,w); GDOC doc3 = fromXML(w.toString()); assert doc.equals(doc3); //true!
note: due to the treatment of root element names, equivalence of XML-based representations is not necessarily preserved after round-trip conversion. It is preserved only if the XML-based representations have been previously produced with the conversion routines.
note: in all the conversions above, null
values in attribute and leaf values are serialised using a special constant (exposed programmatically as Node.NULL
).
note: the conversions are also available at arbitrary inner nodes, not only roots (cf. Bindings.nodeToElement(Node, QName?)
, Bindings.nodeFromElement(Element)
,Bindings.nodeToXML(Node, Writer, QName)
, and Bindings.nodeFromXML(Reader)
.
Consuming Trees
The gDoc
API offers simple means of procedural tree navigation. For declarative queries, clients can convert the model into an XML-based representation and leverage platform standards and popular offerings (e.g. XPath, XQuery, or XSLT implementations). If required, the gDoc
API can then be reasserted on query outputs.
The Node
class defines methods to expose the state common to all nodes of a gDoc
tree:
-
id()
: returns the identifier. -
parent()
: returns the parent. -
ancestors()
: returns the list of all nodes from the parent to the root. -
ancestorsAndSelf()
: behavesancestors
but the returned list includes and starts with the recipient node. -
attributes()
: returns a copy of the attributes, indexed by name. -
attribute(QName)
: returns the value of an attribute with the given name (or fails). -
hasAttribute(QName)
: checks for the existence of an attribute with a given name.
note: identifiers can only be set at node creation time. Attributes can be added, modified, and removed at any point (cf. Node.state(Node.State)
, setAttribute(QName,String)
, removeAttribute(QName)
).
note: for convenience, all methods that take QName
s are overloaded to accept local names as well as (namespace,local name) pairs.
note: invoking methods that take node types is simplified by statically importing the class constants defined in the Nodes
class (cf. Nodes.N
for InnerNode.class
and Nodes.L
for Leaf.class
).
The Leaf
class adds methods to read and set the value (cf. value()
, value(String)
).
The InnerNode
class adds methods to navigate along edges and or identifiers:
-
children()
: returns the list of children. -
children(QName)
: returns the list of children under edges with the given label. -
<T extends Node> children(class<T>)
: returns the list of children of a given node type. -
<T extends Node> children(class<T>, QName)
: returns the list of children of a given node type under edges with a given label. -
child(QName)
: returns the child under an edge with a given label (or fails if there are zero o more such children). -
<T extends Node> child(Class<T>, QName)
:returns the child of a given node type under an edge with a given label (or fails if there are zero o more such children). -
descendants(QName*)
: returns the list of descendants that can be reached following edges with given labels. -
<T extends Node> descendants(Class<T>,QName*)
: returns the list of descendants of a given type that can be reached following edges with given labels. -
edges()
: returns the list of all the edges. -
edges(QName)
: returns the list of edges with a given label. -
edge(QName)
: returns the list of edges with a given label (o fails if there are zero or more such edges). -
hasEdge(QName)
: checks for the existence of an edge with a given label. -
labels()
: returns the list of labels.
note: edges can be added or more removed at any time (cf. add(Edge*)
, removeEdge(Edge*)
, removeEdge(QName)
).
note: as above, methods that take QName
s have overloads that accepts local names and, where appropriate, overloads that accept (namespace,local name) pairs.
note: as above, invoking methods that take node types is simplified by statically importing the corresponding constants in Nodes
(cf. Nodes.N
, Nodes.L
).
The GDoc
class adds a method to read and set the collection identifier (cf. collectionID
, collectionID(String)
).
Finally, the Edge
class exposes its label and target (cf. label()
, target()
).
The following example illustrates some of the supported idioms, do check the code documentation for detailed information about method signatures:
import static org.gcube.contentmanagement.contentmanager.stubs.model.trees.Nodes.*; GDoc doc = attr(gdoc("1", e("a",l("2",5)), e("b",attr( n("3",e("c",4)), a("foo",0))), e("c",5)), a("x",0)); //typed children String val = doc.child(L,"a").value(); //typed descendant String val2 = doc.descendant(N,"3").child(L,"c").value(); for (InnerNode node : doc.children(N)) for (QName l : node.labels()) //process label for (Node d : doc.descendants("b","e")) for (Edge siblingEdge : d.parent().edges()) if (siblingEdge.target()!=d) //process sibling of descendant
Binding Trees
Clients that expect gDoc
trees of a given form may wish to bind them to objects. The API offers two classes to streamline JAXB object bindings. In particular, it includes two base classes for node and document bindings to XML serialisations of gDoc
trees:
-
NodeView
is a base class for node bindings. The view binds and exposes the identifier of the node as well as the URL of the node, if one exists (cf.getID()
,getURL()
). Node URLs are discussed later.
-
GDocView
extendsNodeView
as a base class for document bindings. The view binds and exposes the collection identifier of the root node (cf.getCollID()
), in addition to what already bound and exposed via its superclass.
Clients can extend these classes and the corresponding bindings. The following example illustrates:
@XmlRootElement(name=Bindings.GDOC_NAME,namespace=Bindings.GDOC_NS) class MyDocView extends GDocView { @XmlElement(namespace="http://acme.org") int i; @XmlElement(namespace="http://acme.org") MyDocComponent c; class MyDocComponent extends MyNodeView { @XmlElement Date date; }
MyDocView
and MyDocComponent
are toy examples of user-defined views over gDoc
trees and tree nodes, and they should be familiar to JAXB users.
MyDocView
extends GDocView
and uses JAXB annotations to specify the qualified name of the document elements to which it will be bound. Here we have chosen a name that aligns with the serialisations produced by the Bindings
class, as shown above, but different names may be specified if the binding target serialisations produced through different means.MyDocView
then includes two fields in its own namespace, an integer field and a MyDocComponent
field, both of which are bound to XML elements. MyDocComponent
extends NodeView
, specifies a single Date
, and uses JAXB annotations to bind it to an XML element. In both classes, we have chosen simple JAXB annotations. For example, we have assumed that the gDoc
trees that will come to binding have labels that match the field names. The full range of JAXB facilities is of course available to customise bindings to less aligned trees.
Suppose now MyDocView
is to be bound to the gDoc
tree below. Wee use the generators of the gDoc API to denote the tree, but this is just for convenience of exposition; the tree may have been generated through any suitable means.
GDoc doc = gdoc("collID","123", e(NS,"i",3), e(NS,"c", n("789", e("d",l("1",new Date())), e("b",l("2",15)))), e(NS,"d",new Date()), e(NS,"b",n("456")));
Clearly, the tree contains a subset that matches the binding expectations of the classes above. As with all JAXB clients, the binding would require steps similar to the following:
JAXBContext context = JAXBContext.newInstance(MyDocView.class); ... //assuming a DOM binding to the tree has already occurred (other JAXB inputs could have been used instead, e.g. character streams) Element docElement = .... //bind MyDocView mv = (MyDocView) context.createUnmarshaller().unmarshal(docElement); ...mv.id()... ...mv.collID()... ...mv.url()... ...mv.i... ...mv.c... ...mv.id()... ...mv.url()... ...mv.c.d... //serialiase (again to DOM) Document dom = ....; Marshaller m = context.createMarshaller(); m.marshal(mv,dom);
gDoc Predicates
The gDoc
model is untyped, in that neither the topology of trees nor the values of their attributes or leaves are subject to constraints (beside those dictated by the XML serialisation).
Types are reintroduced later, under the view that they can be projected on gDoc
trees at the point of consumption. Type projections serve two main purposes in the context of the Content Manager:
- to validate the content of
gDoc
trees.
- The main use case for validation is at the point of content ingestion through the write operations of the Content Manager. In particular, a plugin may project a type on incoming
gDoc
trees, with a view to rejecting those that fail the projection.
- the identify the data of interest within
gDoc
trees.
- The main use case for content identification is at the point of content retrieval through the read operations of the Content Manager. Through the service, in particular, a client may ask plugins to return only the portion of the data that succeeds the projection, and to discard the rest. Content pruning results in minimal bandwidth consumption and delivers content to client in forms which are optimal for their processing or bindings.
Accordingly, support for type projections requires:
- a language of tree types with which clients and plugins can capture the required shape and content of
gDoc
trees. - the ability to project such types over
gDoc
trees with both validation and pruning semantics.
XML schema languages are natural candidates for the choice of tree types. However, the also introduce complexity - both conceptually and in terms of tooling - which is not required when working with the subset of XML that corresponds to the gDoc
model. As importantly, schema languages are strongly associated with validation and there are no implementations that use them towards document pruning (or indeed content extraction).
Accordingly, the tree API includes a native language of tree types, the gDoc
predicates, as well as support for projecting them over content for validation and pruning purposes. gDoc
predicates, in particular, can be used to constrain:
- the topology of
gDoc
trees, including the labels and cardinality of edges (e.g. the existence of at least one edge with a given label). - the values of leaves, so that they conform to the textual literal of a range of atomic types (e.g. numbers or boolean values) or simply verify some type-specific predicate.
note: Support for predicates on attributes and wildcard expressions on edge labels is forthcoming.
Predicate API
gDoc
predicates are defined in the packages org.gcube.contentmanagement.contentmanager.stubs.model.predicates
and org.gcube.contentmanagement.contentmanager.stubs.model.constraints
, the main components of which are the following:
-
Predicate
: the interface of all node predicates, definesmatch(Node)
andprune(Node)
methods for validation-based and pruning-based projection semantics.-
AnyPredicate
: aPredicate
that specifies no constraints on nodes, i.e. matches any node and prunes nothing from it. -
TreePredicate
: aPredicate
that specifies a list ofEdgePredicate
s on inner nodes. -
LeafPredicate
: aPredicate
that specifies aConstraint
s on the value of leaf nodes.-
Bool
: anLeafPredicate
that specifies a booleanConstraint
on the value of leaf nodes. -
Num
: anLeafPredicate
that specifies a numericConstraint
on the value of leaf nodes. -
Text
: anLeafPredicate
that specifies a textualConstraint
on the value of leaf nodes. -
Date
: anLeafPredicate
that specifies a dateConstraint
on the value of leaf nodes. -
URI
: anLeafPredicate
that specifies a URIConstraint
on the value of leaf nodes.
-
-
-
EdgePredicate
: a predicate that specifies a nodePredicate
on the targets of edges with a given label.-
One
: anEdgePredicate
that specifies the existence of a single edge with a given label. -
Opt
: anEdgePredicate
that specifies the existence of at most on edge with a given label. -
AtLeast
: anEdgePredicate
that specifies the existence of one or more edges with a given label. -
Many
: anEdgePredicate
that specifies the existence of zero or more edges with a given label. -
ID
: anEdgePredicate
that specifies a given identifier for the source.
-
-
Predicates
: a set of facilities to build tree predicates.
-
Constraint
: the interface of all constraints over values of leaf nodes.-
Same
: theConstraint
that is satisfied by values that are equivalent to a given value. -
Match
: theConstraint
that is satisfied by values that match a given regular expression. -
More
: theConstraint
that is satisfied by values that are numbers strictly greater than a given number. -
Less
: theConstraint
that is satisfied by values that are number strictly smaller than a given number. -
Before
: theConstraint
that is satisfied by values that are earlier dates than a given date. -
After
: theConstraint
that is satisfied by values that are later dates than a given date. -
Not
: theConstraint
that is satisfied by values that do not satisfy anotherConstraint
. -
Either
: theConstraint
that is satisfied by values that satisfy at least one of a number of otherConstraint
s. -
All
: theConstraint
that is satisfied by values that satisfy a number of otherConstraint
s.
-
-
Constraints
: a set of facilities to buildConstraint
s.
Building Predicates
Similarly to gDoc
trees, gDoc
predicates may be built with classic constructor-based idioms and/or else with predicate generators, a collection of factory methods in Predicates
class and Constraints
class which can be statically imported and then composed into a pseudo literal syntax for gDoc predicates. We concentrate here on predicate generators as the preferred way to build gDoc
predicates. See the code documentation for the constructors available in predicate and constraint classes.
Consider this first example:
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*; import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*; Predicate p = tree( one("a", num(more(6))));
Here, tree()
generates a TreePredicate
which characterises trees which satisfy a single EdgePredicate
. This latter predicate requires that the trees have exactly one outgoing edge with label a
and with a leaf target. This leaf must in turn satisfy a Num
predicate, i.e. its text value must represent a number and this number must satisfy a More
constraint, which requires it to be greater than 6
. In summary, we are characterising trees with a single a
-edge that ends in a leaf with a number greater than 6
.
The following example showcases a range of other predicates and constraints:
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*; import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*; Predicate p = tree( one("a",any), one("b",text(either(is("abc"),is("efg")))), atleast("c",bool(is(true))), opt("d",tree()), many("e",date(future())),, one("f", uri(matches("^http.*"))), many("g", num(all(less(5),more(10))), one("h", text()); one("j",text(not(is("somestring")))), one("k",tree(id("12345"))));
Here, the predicate characterises trees with:
- a single
a
-edge that ends in any type of node, inner node or leaf, not characterised further; - one or more
b
-edges that end in leaves whose values are either one of two strings; - a single
c
-edge that ends in a leaf with a boolean value oftrue
; - zero or one
d
-edges that end in inner nodes, not characterised further; - zero or more
e
-edges that end in leaves whose values are dates in the future; - a single
f
-edge that ends in a leaf whose value is an absolute http URI; - zero or more
g
-edges that end in leaves whose values are numbers between5
and10
; - a single
h
-edge that ends in a leaf, not characterised further; - a single
j
-edge that ends in a leaf whose value differs from a given string; - a single
k
-edge that ends in an an inner node with an identifier of12345
;
Clearly, predicates can nest recursively to match the structure of trees.
For a full list of available predicate and constraint generators, see the code documentation of the Predicates
and Constraints
classes.
Matching and Pruning
A gDoc
predicate can be projected over a gDoc
tree using the methods match()
and prune()
common to all Predicate
s. The first indicates whether tree satisfies the predicate, the second prunes it of all the nodes that remain uncharacterised by the predicate.
Consider this simple example:
import static org.gcube.contentmanagement.contentmanager.stubs.model.constraints.Constraints.*; import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*; Date d = new Date(); GDoc doc = gdoc( e("a",-1), e("a",1), e("a",2), e("b","..."), e("b",n( e("b1","..."))), e("c",n( e("c1",d), e("c2","..."))), e("d","...")); Predicate p = tree( many("a",num(more(0))), atleast("b",tree()), one("c", tree( one("c1",date())))); assert p.matches(doc)==true; GDoc pruned = gdoc( e("a",1), e("a",2), e("b",n( e("b1","..."))), e("c",n( e("c1",d)))); assert pruned.equals(p.prune(doc));
Here, it is easy to see that the gDoc
tree satisfies the predicate. Accordingly, match()
returns true
and prune()
successfully reduces the tree to include only the paths from the root which are directly described by the predicate (i.e. a tree equivalent to pruned
in the example). If the tree had had a second c
-edge, for example, match
would have returned false
and prune()
would have failed with an exception. Note that, If the tree had had no b
-edges, or if its b
-edges had all ended in leaves, or in fact under any of a number of alternative assumptions, the outcome would have been equally negative.
note: match()
does never fail, as a gDoc
tree either satisfies the predicate or it does not. In contrast, prune
fails whenever the tree does not match the predicate. In other words, prune
subsumes match
and reacts with a failure to mismatches.
Serialising and Deserialising Predicates
The predicate and constraint classes are ready for JAXB binding to XML (i.e. contain appropriate JAXB annotations). In addition, the Predicates
class encapsulates a JAXB context and exposes javax.xml.bind.Marshaller
s and javax.xml.bind.Unmarshaller
s ready for client use (cf. getMarshaller()
, getUnmarshaller
).
For example, a client that needs works with character streams may operate as follows:
import static org.gcube.contentmanagement.contentmanager.stubs.model.predicates.Predicates.*; //serialise predicate Predicate p1 = ...; Writer w = ... getMarshaller().marshal(p,w); //deserialise predicate Reader r =... Predicate p2 = (Predicate) getUnmarshaller().unmarshal(r);
note: clients who use the Content Management libraries do not explicitly need to worry about conversion to and from DOM representations of predicates. The libraries perform conversions on their behalf.
Interfaces
As overviewed above, the interface of the Content Manager is distributed across three stateful port-types:
- the
Factory
port-type, which is the interface of a single WS-Resource; - the
ReadManager
port-type, which is the read-only interface of many, collection-specific WS-Resources; - the
WriteManager
port-type, which is the write-only interface of many, collection-specific WS-Resources.
In this Section, we discuss in more detail the operations of the port-types and the Resource Properties of the corresponding WS-Resources. For clarity, we show and comment the signatures of operations in terms of the underlying Java implementation, which mirror the WSDL definitions. We point directly to the WSDL for the definition of auxiliary data structures for input and outputs (e.g. values of Resource Properties or records of ResultSets).
note: The service offers client-side libraries which operate at a higher level of abstraction than its public interface. The operations discussed below are of interest to clients that choose to bypass those facilities.
Factory
The Factory
resource exposes two operations, both which are intended for clients that wish to create Collection Managers (ReadManager
s and/or WriteManager
s). The operations are not for generic use, in that clients are expected to target specifically the plugin to which the Factory
resource ought to delegate the creation of Collection Managers. The operations, their inputs, and their outputs are defined in the WSDL of the port-type.
The first operation is synchronous, in that clients block waiting for the creation of the Collection Managers:
-
public EprList create(CMSCreateParameters parameters) throws GCUBEFault
- the operation creates Collection Managers and returns a list of references to their endpoints, indexed by the name of the corresponding port-type. The creation of resource is paremetrised by:
- the name of the plugin to which the service ought to dispatch the request, the target plugin.
- the directive to the service to broadcast the request for state replication, in line with OCMA patterns.
- an arbitrary DOM payload of parameters specific to the target plugin.
- these may vary from plugin to plugin, but typically will contain sufficient information to directly or indirectly identify one or more collections and the repository that hosts them.
The second operation is instead asynchronous and it is indicated when target plugins declare long creation times in their documentation:
-
String createAsync(CMSCreateParameters parameters) throws GCUBEFault
- the operation takes the same input as the synchronous version but returns immediately with the locator of a ResultSet which will be populated by the service only when the Collection Managers are created by the target plugin. Specifically, the ResultSet will contain a single
CMSCreateOutcome
, which captures the outcome of the operation (see WSDL definition). This can be a success or else a failure. In the first case, clients obtain theEprList
described above. In the second case, they obtain the stack trace of the fault that occurred during the creation of the Collection Managers.
FInally, the Factory
resource publishes a multi-valued Resource Property Plugin
, which contains a PluginDescription
for each of the plugins hosted by the running instance (see WSDL definition).
Read Managers
A ReadManager
resource exposes three read-only operations over the content of the bound collection. The operations are intended for generic use, in that clients are not required to know the repository that hosts the collection or the service plugin that mediates access to that repository. The operations, their inputs, and their outputs are defined in the WSDL of the port-type.
-
AnyHolder getByID(GetByIDParams params) throws GCUBEFault
- the operation returns (the DOM representation of) a
gDoc
tree. Invocations are parameterised by:- the identifier of the tree root;
- [optional] (the DOM representation of) a tree predicate with respect to which the tree should be pruned before it is returned.
-
String getByIDs(GetByIDsParams params) throws GCUBEFault
- the operation returns the locator of a ResultSet of (DOM representations of)
gDoc
trees. Invocations are parameterised by:- the locator of a ResultSet of (DOM representations of) tree root identifiers;
- [optional] (the DOM representations of) a tree predicate with respect to which the trees should be pruned before they are returned.
- note: tree predicates in input and
gDoc
trees in output are contained in anAnyHolder
wrapper (see WSDL definition).
-
String get(GetParams params) throws GCUBEFault
- the operation returns the locator of a ResultSet of (DOM representations of)
gDoc
trees. Invocations are parameterised by:- [optional] (the DOM representations of) a tree predicate with respect to which the trees should be pruned before they are returned.
- [optional] (the DOM representations of) a tree predicate with respect to which the trees should be filtered before they are returned.
- Filtering occurs before pruning and may be concern parts of the trees that are subsequently pruned.
- note: invocations that specify no parameters effectively ask for the contents of the whole collection.
- note: tree predicates in input and
gDoc
trees in output are contained in anAnyHolder
wrapper (see WSDL definition).
A ReadManager
resource publishes the following Resource Properties:
-
CollectionID
: the identifier of the bound collection, as per OCMA requirements; -
TypeID
: the access type of the port-type, as per OCMA requirements. The value of this property is constant:gDocRead
; -
Plugin
: the name of the bound plugin;
Write Managers
A WriteManager
exposes four write-only operations over the content of a bound collection. The operations are intended for generic use, in that clients are not required to know the repository that hosts the collection or the service plugin that mediates access to that repository. The operations, their inputs, and their outputs are defined in the WSDL of the port-type.
-
String add(AnyHolder holder) throws InvalidDocumentFault, GCUBEFault
- the operation adds a
gDoc
tree to the bound collection and returns the identifier assigned to the tree root as a result. - note: the operation assumes that all nodes of the input tree acquire identifiers as a result of a successful addition. Depending on the plugin that serves the request, the operation may fail if the nodes already have an identifier.
-
String addRS(String locator) throws GCUBEFault
- the operation adds zero or more
gDoc
trees to the bound collection and returns a locator of a ResultSet ofAddOutcome
s, one for each tree in input (see WSDL definition). In the case of successful outcome, clients obtain the identifier assigned to the tree root. In the case of failure, they obtain the stack trace of the fault that prevented the addition of the tree. - note: again, the operation assumes that all nodes of the input tree acquire identifiers as a result of a successful addition. Depending on the plugin that serves the request, the operation may fail if the nodes already have an identifier.
-
VOID update(AnyHolder holder) throws UnknownDocumentFault, InvalidDocumentFault, GCUBEFault
- the operation updates a
gDoc
tree in the bound collection (the target tree). Invocations are parameterised by (the DOM representation of) of a delta tree, agDoc
tree that captures all the updates to be applied to the target tree, including addition or removal of attributes, changes to attribute values, addition and removal of nodes, and changes to leaf values. In more detail, the delta tree is comprised of nodes marked with an attributehttp://gcube-system.org/namespaces/contentmanagement/gdoc:status
and includes:- all the nodes of the target tree which have changed, with the corresponding identifiers and with a
status
ofMODIFIED
orDELETED
;
- note: attributes of the target tree that have been removed occur in the delta tree with a value of
_null
; - note:
DELETED
nodes are either leaves with a value of_null_
or inner nodes with no children. - note: nodes that have no identifiers cannot be referred to in the delta tree, hence cannot be updated.
- all the nodes that should be added to the target tree, without identifiers and with a
status
ofNEW
.
- note:
NEW
nodes are leaves or inner nodes withNEW
descendants. - note: depending on the bound plugin,
NEW
nodes may be assigned identifiers upon being added to the target tree.
- all the nodes of the target tree which have changed, with the corresponding identifiers and with a
- As an example, consider the following
gDoc
tree, wherestatus
attributes are depicted as node marker:
- Here, the delta tree captures a set of updates to the descendants of a
gDoc
tree:
- the root has acquired or changed an attribute
x
; - node
2
has lost its child5
and all its descendants; - node
3
has acquired a child and its descendants; - node
4
has lost the attributez
and the child6
, while the value of its child leaf7
has changed.
- the root has acquired or changed an attribute
- note: all the nodes of the target tree that have not changed do not occur in the delta tree.
-
String updateRS(String locator) throws GCUBEFault
- the operation takes the locator of a ResultSet with delta trees for zero or more (DOM representations of) target
gDoc
trees and returns the locator of a ResultSet ofUpdateFailure
s (see WSDL definition), one for each delta tree that could not be processed into an update of the corresponding target tree. In particular, clients obtain the identifier of the root of the target tree and the stacktrace of the error that occurred during its update. - note: delta trees in input are contained in
AnyHolder
wrappers (see WSDL definition).
A WriteManager
resource publishes the following Resource Properties:
-
CollectionID
: the identifier of the bound collection, as per OCMA requirements; -
TypeID
: the access type of the port-type, as per OCMA requirements. The value of this property is constant:gDocWrite
; -
Plugin
: the name of the bound plugin;
Plugins
...coming soon...
Client Libraries
The Content Manager service is developed alongside a number of client libraries that simplify the task of interacting with the service.
The Stub Distribution of the service is the first and foremost client library. Its design goal is to offer abstractions over the content model and interface of the service. In this sense, the distribution acts both as a client library and and a service-side library for plugin developments. In particular, it is a dependency of the service as well as a dependency of service clients.
The Content Management Library (CML) builds upon the stub distribution to support the gCube document model, a concrete document model made of canonical forms for metadata, annotations, document parts, and alternative representations.
Stub Distribution
The distribution includes:
- the API for
gDoc
trees; - the API for
gDoc
tree predicates; - the stubs of the service automatically generated from the WSDL definition of its port-types;
- the high-level calls, a set of abstractions over the service stubs;
- a Java protocol handler and associated facilities for deriving and resolving content URLs, i.e. URLs with a service-specific scheme
cms
.
We have previously presented most of the APIs for gDoc
trees and tree predicates. We concentrate here on high-level calls and content URLs, completing the presentation of the tree and tree predicate APIs in the process.
High-Level Calls
High-level calls are Java objects that model single-step or multi-step interactions with the Content Management service. The objects encapsulate stub-based interactions behind local object-oriented interfaces that offer transparencies over the remote interfaces of the service port-types.
The local interfaces are based on language features that are not found in the service stubs, including high-level models of inputs and outputs, method overloading, parametric types, asynchronous callbacks.
Behind these abstractions, the call objects engage in optimised and best-effort interactions with the WS-Resources of the services; in particular, they can hide from clients the complexity of resource discovery while keeping visible the remote nature of the interactions and the possibility of their failure.
High-level calls are defined in the package org.gcube.contentmanagement.contentmanager.stubs.calls
. The main components are depicted below:
-
BaseCall
: the base class for all high-level calls. -
FactoryCall
: aBaseCall
that represents calls to the code>Factory</code> resource of the service. -
FactoryParams
: used inFactoryCall
to model the input of operations to the code>Factory</code> resource of the service. -
FactoryConsumer
: used inFactoryCall
to callback invokers of the asynchronous operation of the code>Factory</code> resource of the service. -
ManagerCall
: an abstract extension ofBaseCall
for calls to the Collection Managers of the service. -
ReadManagerCall
: aManagerCall
that represents calls to <core>ReadManager</code> resources of the service. -
WriteManagerCall
: aManagerCall
that represents calls to <core>WriteManager</code> resources of the service. -
MappingRegistry
: a central registry of type mappings for I/O. -
Constants
: a collection of service-specific constants. -
Utils
: a collection of utilities for I/O conversions.
In what follows, we exemplify the use of FactoryCall
s, ReadManagerCall
s, and WriteManagerCalls
.
Factory Calls
A FactoryCall
is created in a a scope:
//some scope GCUBEScope scope = ..... FactoryCall call = new FactoryCall(scope);
In a secure infrastructure, the call may also be created with a security manager:
//some scope GCUBEScope scope = ..... //some security manager = .... GCUBESecurityManager manager = .... FactoryCall call = new FactoryCall(scope,manager);
The call may then be issued, i.e. used to create CollectionManagers. In line with the operations of the remote port-type, this can be done synchronously or asynchronously.
The synchronous invocation requires the preparation of FactoryParameters
;
FactoryParameters params = new FactoryParameters() ; params.setPlugin("..somepluginname..."); params.setBroadcast(false); //the DOM serialisation of plugin-specific creation parameters org.w3c.dom.Element payload = ... params.setPayload(payload) //issue the call List<EprPair> eprs = call.create(params); //process the response for (EprPair pair : eprs) .... pair.getPorttype() ... pair.getEpr() ...
note: typically, plugin will offer object bindings for the payloads that they support. The payload input to the create()
method will then be obtained by serialising the bound objects.
The asynchronous invocation requires the additional preparation of a FactoryConsumer
:
//prepare as above FactoryParameters params = ..... //creates consumer FactoryConsumer consumer = new FactoryConsumer { protected void onCompletion(List<EprPair> eprs) { .... process pairs as above }; protected void onFailure(Exception e) { ... handle failure }; }; //issue the call call.createASync(params,consumer);
In both interactions above, the FactoryCall
will attempt to discover Factory
WS-Resources that host the plugin named in the parameters. It will then try to interact with each resource in turn, until one responds successfully or else indicates that continuing will be to no avail (by returning a GCUBEUnretrievableFault
).
note: clients can obtain and customise the query that underlies the strategy (cf. getQuery()
) and, if needed, reset it to its default (resetQuery()
).
note: while call objects are often created anew for individual calls to the remote port-type, clients can use the same object for multiple calls (though this is unlikely for FactoryCall
s). When this is the case, the calls occur in the same, initially configured scope and the second call 'sticks' to the resource used by the first. The best-effort strategy is intentionally limited to the first invocation only.
Clients who know and wish to target a specific Factory
resource, can disable the best-effort strategy by configuring the call with a reference to its endpoint:
//a reference to the endpoint of a Content Manager RI EndpointReferenceType epr = ... call.setEndpointReferenceType(epr); //alternatively: call.setEndpoint("... somehostname ...",".. someport ..");
ReadManager Calls
A ReadManagerCall
gives high-level write access to the content of a given collection, as allowed by a ReadManager
resource bound to that collection.
It follows the same patterns already seen for FactoryCall
s. In particular, it is created in a scope and, optionally, with a security manager.
//some scope GCUBEScope scope = ..... ReadManagerCall call = new ReadManagerCall(scope); //some security manager = .... GCUBESecurityManager manager = .... ReadManagerCall secureCall = new ReadManagerCall(scope,manager);
As a further option, it may be crated with the identifier of the target collection:
//some scope GCUBEScope scope = ..... ReadManagerCall call = new ReadManagerCall("... some collection identifier ...",scope); //some security manager = .... GCUBESecurityManager manager = .... ReadManagerCall secureCall = new ReadManagerCall("... some collection identifier ...",scope,manager);
note: the collection identifier may also be set after call construction (cf. setCollectionID(String)
).
The call may then be configured as a FactoryCall
, i.e. setting reference to resource endpoint for targeted interactions (cf. setEndpointReference(EndpointReference)
). or else relying on implicit discovery and best-effort strategy; in the latter case, the query that underlie the strategy can be customised (cf. getQuery()
,resetQuery()
).
The call may then be used to retrieve individual or multiple gDoc
trees from the target collection. The available options may be illustrated as follows:
//some tree root identifier String id = ... /////////////////////////////////// Single-Valued Lookups //return one gDoc tree GDoc doc1 = call.get(id); //some tree predicate to use for pruning Predicate projection = .... //prune and return one gDoc tree GDoc doc2 = call.get(id,projection); /////////////////////////////////// Multi-valued Lookups //a locator to a local or remote ResultSet of tree root identifiers RSLocator identifiers = .... //returns a locator to a remote ResultSet of gDoc trees with given identifiers RSLocator locator1 = call.get(identifiers); //returns a locator to a remote ResultSet of gDoc trees with given identifiers, pruned by a tree predicate RSLocator locator2 = call.get(identifiers,predicate); /////////////////////////////////// Queries //return a locator to a remote ResultSet of many gDoc trees pruned by a tree predicate RSLocator locator3 = call.get(projection); //some tree predicate to use for filtering Predicate filter = .... //return a locator to a remote ResultSet of many pruned gDoc trees that satisfy a given filter RSLocator locator4= call.get(projection,filter); //return a locator to a remote ResultSet of all the gDoc trees in the collection RSLocator locator5 = call.get();
note: the single-valued lookup operations are overloaded to return gDoc
trees as DOM elements, so as to avoid the cost of parsing into GDoc object when a different binding is required (cf. getAsElement(String)
, getAsElement(String, Predicate)
).
note: the ResultSets returned by the multi-valued lookup and query operations contain DOM representations of gDoc
trees.
WriteManager Calls
A WriteManagerCall
gives high-level write access to the content of a given collection, as allowed by a WriteManager
resource bound to that collection.
It follows the same patterns already seen for FactoryCall
s. In particular, it is created in a scope and, optionally, with a security manager:
//some scope GCUBEScope scope = ..... ReadManagerCall call = new ReadManagerCall(scope); //some security manager = .... GCUBESecurityManager manager = .... ReadManagerCall secureCall = new ReadManagerCall(scope,manager);
As a further option, it may be crated with the identifier of the target collection:
//some scope GCUBEScope scope = ..... ReadManagerCall call = new ReadManagerCall("... some collection identifier ...",scope); //some security manager = .... GCUBESecurityManager manager = .... ReadManagerCall secureCall = new ReadManagerCall("... some collection identifier ...",scope,manager);
note: the collection identifier may also be set after call construction (cf. setCollectionID(String)
).
The call may then be configured as a FactoryCall
, i.e. setting reference to resource endpoint for targeted interactions (cf. setEndpointReference(EndpointReference)
). or else relying on implicit discovery and best-effort strategy; in the latter case, the query that underlie the strategy can be customised (cf. getQuery()
,resetQuery()
).
The call may then be used to add or update individual or multiple gDoc
trees into the target collection. The available options may be illustrated as follows:
/////////////////////////////////// Additions //A gDoc tree without identifiers GDoc doc = .... //adds a gDoc tree and receives the identifier assigned to its root String rootID = call.add(doc); //a locator to a ResultSet of gDoc trees without identifiers. RSLocator locator1 = .... //adds many gDoc trees and receives a locator to a remote ResultSet of AddOutcome objects (see WSDL) RSLocator locator2 = call.add(locator); /////////////////////////////////// Updates //A delta tree GDoc delta = .... //updates the document with a delta tree call.update(delta); //a locator to a ResultSet of DOM representations of delta trees. RSLocator deltas = ... //updates zero or more documents with corresponding delta trees and receives a locator to a ResultSet of UpdateFailure objects (see WSDL) RSLocator locator3 = call.update(deltas);
note: the single-valued add()
operation is overloaded to accept a gDoc
trees as a DOM element, so as to avoid the cost of serialising to a GDoc object when a different binding is already available (cf. add(Element)
).