Difference between revisions of "Content Management"

From Gcube Wiki
Jump to: navigation, search
Line 1: Line 1:
= The Content Management Service =
+
While other infrastructures for the manipulation of content in Grid-based environments, like gLite, only provide basic file-system like functionality for content manipulation, the Information Organization services are aimed to provide more high-level functionality, that allows finer control of content. Building on the basic Info-Object data model, more sophisticated data model can be built and exposed. This is the case for the '''Content Management Service''', described in this page, which exposes the gCube document model. The name of the service has historical reasons. A more appropriate name would be ''Document Management Service''.
While other infrastructures for the manipulation of content in Grid-based environments, like gLite, only provide basic file-system like functionality for content manipulation, the Information Organization services are aimed to provide more high-level functionality, that allows finer control of content. Building on the basic Info-Object data model, more sophisticated data model can be built and exposed. This is the case for the Content Management Service, which exposes the gCube document model. The name of the service has historical reasons. A more appropriate name would be ''Document Management Service''.
+
  
== Reference Model ==
+
= Reference Model =
 
The gCube (complex) document model is a data model built on top of the Information Object Model. It describes complex documents (as opposed to simple information objects). These documents have intrinsically hierarchical nature: they can be composed of several, eventually nested '''parts''' which are in turn complex documents. Each part in turn can expose several natures, or '''alternative representations'''. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). Each subpart in such document can be available under different forms. For instance, an image can be provided both in .png and .jpeg format. A book can be described as a complex documents whose parts (e.g. sections) can in turn be seen as complex documents (containing, e.g. chapters), and so on.  A positioning attribute helps in representing aggregate objects made up of parts that have to be fitted together in a certain order.
 
The gCube (complex) document model is a data model built on top of the Information Object Model. It describes complex documents (as opposed to simple information objects). These documents have intrinsically hierarchical nature: they can be composed of several, eventually nested '''parts''' which are in turn complex documents. Each part in turn can expose several natures, or '''alternative representations'''. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). Each subpart in such document can be available under different forms. For instance, an image can be provided both in .png and .jpeg format. A book can be described as a complex documents whose parts (e.g. sections) can in turn be seen as complex documents (containing, e.g. chapters), and so on.  A positioning attribute helps in representing aggregate objects made up of parts that have to be fitted together in a certain order.
  
Line 17: Line 16:
 
The document model abstraction, however, encapsulates and hides these specific details related to how the model is built over the Information Object Model.
 
The document model abstraction, however, encapsulates and hides these specific details related to how the model is built over the Information Object Model.
  
== Detailed Service Description ==
+
= Detailed Service Description =
 
The Content Management Service is a gCube service exposing the gCube document model. It exposing the functionality to manipulate complex documents according to the model, and converts them to operations on the basec information object model (which is in turn exposed by the Storage Management Component). Its methods allow to create and destroy complex documents, and to manipulate them by adding or removing their parts and their alternative representations, and to set and unset their properties.  
 
The Content Management Service is a gCube service exposing the gCube document model. It exposing the functionality to manipulate complex documents according to the model, and converts them to operations on the basec information object model (which is in turn exposed by the Storage Management Component). Its methods allow to create and destroy complex documents, and to manipulate them by adding or removing their parts and their alternative representations, and to set and unset their properties.  
  
 
It is important to observe that, though the document model is independent of other high level models, this service is not completely independent, for the logics of its operation, from the Collection Management Service. More specifically, its interface enforces that any complex document must belong to at least one materialized collection, or be a part or alternative representation of a document which belongs to a collection. Any document may, on the other hand, belong to more than one collection. The reason for this choice is motivated by the need to maintain every document stored inside the architecture "reachable", in the sense that it is possible to obtain its identifier simply navigating relationships starting from a relatively small subset of entry points (the collections).
 
It is important to observe that, though the document model is independent of other high level models, this service is not completely independent, for the logics of its operation, from the Collection Management Service. More specifically, its interface enforces that any complex document must belong to at least one materialized collection, or be a part or alternative representation of a document which belongs to a collection. Any document may, on the other hand, belong to more than one collection. The reason for this choice is motivated by the need to maintain every document stored inside the architecture "reachable", in the sense that it is possible to obtain its identifier simply navigating relationships starting from a relatively small subset of entry points (the collections).
  
===Resources and Properties===
+
==Resources and Properties==
 
The Content Management Service is implemented as a stateless WRSF-compliant web service. It does not publish any resource, and only depends on the underlying Storage Management Service for its operation.  
 
The Content Management Service is implemented as a stateless WRSF-compliant web service. It does not publish any resource, and only depends on the underlying Storage Management Service for its operation.  
  
===Functions===
+
==Functions==
 
Through its methods, the service exposes the document model described in the previous section. Internally, these high-level operations are mapped onto generic Storage Management operations. For this reason, many of its operations accept non functional parameters to be passed to the underlying Storage Management Service in the form of Storage Hints. The functionality supported by the Content Management Service is shortly described next. Methods are grouped into functional groups for reading convenience.
 
Through its methods, the service exposes the document model described in the previous section. Internally, these high-level operations are mapped onto generic Storage Management operations. For this reason, many of its operations accept non functional parameters to be passed to the underlying Storage Management Service in the form of Storage Hints. The functionality supported by the Content Management Service is shortly described next. Methods are grouped into functional groups for reading convenience.
  
==== Document Creation, Access and Deletion ====
+
=== Document Creation, Access and Deletion ===
 
*'''storeDocument()''' – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, a collection ID and an array of storage hints and stores the document within the given collection;
 
*'''storeDocument()''' – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, a collection ID and an array of storage hints and stores the document within the given collection;
 
*'''getDocument()''' – which takes as input parameter a message containing the document ID, the URI from where the raw content can be downloaded, a set of storage hints and a specification of the unroll level, i.e. an upper limit on the maximum number of relations that have to be traversed when retrieving a complex document composed by multiple nested subparts (this is an optimization feature), and returns the document description;
 
*'''getDocument()''' – which takes as input parameter a message containing the document ID, the URI from where the raw content can be downloaded, a set of storage hints and a specification of the unroll level, i.e. an upper limit on the maximum number of relations that have to be traversed when retrieving a complex document composed by multiple nested subparts (this is an optimization feature), and returns the document description;
 
*'''deleteDocument()''' – which takes as input parameter a message containing the document ID and deletes such a document (including raw content and references, may cause cascaded deletes based on propagation rules) from the set of documents managed by the service;
 
*'''deleteDocument()''' – which takes as input parameter a message containing the document ID and deletes such a document (including raw content and references, may cause cascaded deletes based on propagation rules) from the set of documents managed by the service;
  
==== Part Manipulation ====
+
=== Part Manipulation ===
 
*'''addPart()''' – which takes as input parameter a message containing the parent document ID, the part document ID, the position the part should occupy in the parent document (optional) and a Boolean value indicating whether the associated part has to be removed whenever the main document is removed or not and attach the part to the specified document (i.e. creates the part-of relation);
 
*'''addPart()''' – which takes as input parameter a message containing the parent document ID, the part document ID, the position the part should occupy in the parent document (optional) and a Boolean value indicating whether the associated part has to be removed whenever the main document is removed or not and attach the part to the specified document (i.e. creates the part-of relation);
 
*'''removePart()''' – which takes as input parameter a message containing the parent document ID and the part document ID and removes the specified part from the selected document (i.e. removes the part-of relation);
 
*'''removePart()''' – which takes as input parameter a message containing the parent document ID and the part document ID and removes the specified part from the selected document (i.e. removes the part-of relation);
Line 40: Line 39:
 
*'''getDirectParents()''' – which takes as input parameter a message containing the ID of an alternative representation and returns the list of Information Object IDs (OIDs) of the main representations of the Document;
 
*'''getDirectParents()''' – which takes as input parameter a message containing the ID of an alternative representation and returns the list of Information Object IDs (OIDs) of the main representations of the Document;
  
==== Alternative Representation Manipulation ====
+
=== Alternative Representation Manipulation ===
 
*'''removeAlternativeRepresentation()''' – which takes as input parameter a message containing the document ID and the representation ID, and removes the alternative representation from the selected document (i.e. the is-representation-of relation);
 
*'''removeAlternativeRepresentation()''' – which takes as input parameter a message containing the document ID and the representation ID, and removes the alternative representation from the selected document (i.e. the is-representation-of relation);
 
*'''addAlternativeRepresentation()''' – which takes as input parameter a message containing the document ID, the ID of an Information Object that represent the alternative representation, the rank this alternative representation has (optional) and a Boolean value (optional) indicating whether the alternative representation has to be deleted whenever the main object is deleted or not, and assigns such a representation to the specified Document;
 
*'''addAlternativeRepresentation()''' – which takes as input parameter a message containing the document ID, the ID of an Information Object that represent the alternative representation, the rank this alternative representation has (optional) and a Boolean value (optional) indicating whether the alternative representation has to be deleted whenever the main object is deleted or not, and assigns such a representation to the specified Document;
 
*'''getAlternativeRepresentations()''' – which takes as input parameter a message containing the document ID and returns the list of all existing representations of the specified document;
 
*'''getAlternativeRepresentations()''' – which takes as input parameter a message containing the document ID and returns the list of all existing representations of the specified document;
  
==== Document/Part Properties Manipulation ====
+
=== Document/Part Properties Manipulation ===
 
*'''renameDocument()''' – which takes as input parameter a message containing the document ID and the document name and attaches such a name to the given document;
 
*'''renameDocument()''' – which takes as input parameter a message containing the document ID and the document name and attaches such a name to the given document;
 
*'''setDocumentProperty()''' – which takes as input parameter a message containing a document ID, a property name, a property type and a property value and attaches such a property to the given document;
 
*'''setDocumentProperty()''' – which takes as input parameter a message containing a document ID, a property name, a property type and a property value and attaches such a property to the given document;
Line 51: Line 50:
 
*'''unsetDocumentProperty()''' – which takes as input parameter a message containing a document ID and a property name and removes the specified property from the document;
 
*'''unsetDocumentProperty()''' – which takes as input parameter a message containing a document ID and a property name and removes the specified property from the document;
  
==== Document/Part Raw Content Manipulation ====
+
=== Document/Part Raw Content Manipulation ===
 
*'''updateDocumentContent()''' – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, and an array of storage hints and updates the content of the given document;
 
*'''updateDocumentContent()''' – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, and an array of storage hints and updates the content of the given document;
 
*'''hasRawContent()''' – which takes as input parameter as input parameter a message containing the document ID and returns a Boolean value indicating whether the document has a raw content or not;
 
*'''hasRawContent()''' – which takes as input parameter as input parameter a message containing the document ID and returns a Boolean value indicating whether the document has a raw content or not;

Revision as of 16:46, 2 June 2009

While other infrastructures for the manipulation of content in Grid-based environments, like gLite, only provide basic file-system like functionality for content manipulation, the Information Organization services are aimed to provide more high-level functionality, that allows finer control of content. Building on the basic Info-Object data model, more sophisticated data model can be built and exposed. This is the case for the Content Management Service, described in this page, which exposes the gCube document model. The name of the service has historical reasons. A more appropriate name would be Document Management Service.

Reference Model

The gCube (complex) document model is a data model built on top of the Information Object Model. It describes complex documents (as opposed to simple information objects). These documents have intrinsically hierarchical nature: they can be composed of several, eventually nested parts which are in turn complex documents. Each part in turn can expose several natures, or alternative representations. For instance, an HTML document that includes a number of images may be modelled as a complex object that provides references to Information Objects (containing the images). Each subpart in such document can be available under different forms. For instance, an image can be provided both in .png and .jpeg format. A book can be described as a complex documents whose parts (e.g. sections) can in turn be seen as complex documents (containing, e.g. chapters), and so on. A positioning attribute helps in representing aggregate objects made up of parts that have to be fitted together in a certain order.

Behind the scenes, complex documents are represented as chains of Information Objects linked via appropriate relationships. More specifically, any document in the document model is a tree rooted at one single information object. A parent is related to its children (e.g. its parts) through a relationship having primary role:

  • contentmanagement:has-part

Every node (part) in this tree can also have an alternative representation. If two nodes A and B are one the alternative representation of the other, then there are relationships having primary role:

  • contentmanagement:is-represented-by

from a to B and vice versa. Finally, notice that each part (including the root) in a complex document is an information object. As such, it has properties and can have a raw content.

The document model abstraction, however, encapsulates and hides these specific details related to how the model is built over the Information Object Model.

Detailed Service Description

The Content Management Service is a gCube service exposing the gCube document model. It exposing the functionality to manipulate complex documents according to the model, and converts them to operations on the basec information object model (which is in turn exposed by the Storage Management Component). Its methods allow to create and destroy complex documents, and to manipulate them by adding or removing their parts and their alternative representations, and to set and unset their properties.

It is important to observe that, though the document model is independent of other high level models, this service is not completely independent, for the logics of its operation, from the Collection Management Service. More specifically, its interface enforces that any complex document must belong to at least one materialized collection, or be a part or alternative representation of a document which belongs to a collection. Any document may, on the other hand, belong to more than one collection. The reason for this choice is motivated by the need to maintain every document stored inside the architecture "reachable", in the sense that it is possible to obtain its identifier simply navigating relationships starting from a relatively small subset of entry points (the collections).

Resources and Properties

The Content Management Service is implemented as a stateless WRSF-compliant web service. It does not publish any resource, and only depends on the underlying Storage Management Service for its operation.

Functions

Through its methods, the service exposes the document model described in the previous section. Internally, these high-level operations are mapped onto generic Storage Management operations. For this reason, many of its operations accept non functional parameters to be passed to the underlying Storage Management Service in the form of Storage Hints. The functionality supported by the Content Management Service is shortly described next. Methods are grouped into functional groups for reading convenience.

Document Creation, Access and Deletion

  • storeDocument() – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, a collection ID and an array of storage hints and stores the document within the given collection;
  • getDocument() – which takes as input parameter a message containing the document ID, the URI from where the raw content can be downloaded, a set of storage hints and a specification of the unroll level, i.e. an upper limit on the maximum number of relations that have to be traversed when retrieving a complex document composed by multiple nested subparts (this is an optimization feature), and returns the document description;
  • deleteDocument() – which takes as input parameter a message containing the document ID and deletes such a document (including raw content and references, may cause cascaded deletes based on propagation rules) from the set of documents managed by the service;

Part Manipulation

  • addPart() – which takes as input parameter a message containing the parent document ID, the part document ID, the position the part should occupy in the parent document (optional) and a Boolean value indicating whether the associated part has to be removed whenever the main document is removed or not and attach the part to the specified document (i.e. creates the part-of relation);
  • removePart() – which takes as input parameter a message containing the parent document ID and the part document ID and removes the specified part from the selected document (i.e. removes the part-of relation);
  • getParts() – which takes as input parameter a message containing the parent document ID and returns the list of Document IDs that are (direct) parts of the specified document;
  • getDirectParents() – which takes as input parameter a message containing an Information Object ID and returns a list of Information Object IDs of all the objects that are (direct) parents of the specified one;
  • getDirectParents() – which takes as input parameter a message containing the ID of an alternative representation and returns the list of Information Object IDs (OIDs) of the main representations of the Document;

Alternative Representation Manipulation

  • removeAlternativeRepresentation() – which takes as input parameter a message containing the document ID and the representation ID, and removes the alternative representation from the selected document (i.e. the is-representation-of relation);
  • addAlternativeRepresentation() – which takes as input parameter a message containing the document ID, the ID of an Information Object that represent the alternative representation, the rank this alternative representation has (optional) and a Boolean value (optional) indicating whether the alternative representation has to be deleted whenever the main object is deleted or not, and assigns such a representation to the specified Document;
  • getAlternativeRepresentations() – which takes as input parameter a message containing the document ID and returns the list of all existing representations of the specified document;

Document/Part Properties Manipulation

  • renameDocument() – which takes as input parameter a message containing the document ID and the document name and attaches such a name to the given document;
  • setDocumentProperty() – which takes as input parameter a message containing a document ID, a property name, a property type and a property value and attaches such a property to the given document;
  • getDocumentProperty() – which takes as input parameter a message containing a document ID and a property name and returns the specified document property in terms of the ID, name, type and value;
  • unsetDocumentProperty() – which takes as input parameter a message containing a document ID and a property name and removes the specified property from the document;

Document/Part Raw Content Manipulation

  • updateDocumentContent() – which takes as input parameter a message containing the document ID, an URI from where the raw content can be gathered or the raw content itself, and an array of storage hints and updates the content of the given document;
  • hasRawContent() – which takes as input parameter as input parameter a message containing the document ID and returns a Boolean value indicating whether the document has a raw content or not;
  • getContentLength() – which takes as input parameter as input parameter a message containing the document ID and returns the size of the raw content of the document;
  • getMimeType() – which takes as input parameter a message containing the document ID and returns the MIME Type associated to the raw content of the document;
  • setMimeType() – which takes as input parameter a message containing the Document ID and a string representing a MIME Type and set such a type as those of the raw content of the document;