Tabular Data Service Model and Operations

Data Model

Tables in the Tabular Data system are entities made of two separate elements:

Raw data: This can be imagined as the data contained in user provided CSV files
Metadata
- Data structure: this metadata describes how data is structured (e.g.: columns number or column data type) and how raw data can be reached
- Enriching Metadata: This metadata adds information on top of raw data and provides some context or additional information on top of it.

Raw data is managed directly by leveraging relational database services (PostgreSQL with Postgis extension). Metadata is managed and represented through a metadata model library called tabular-model. Tabular Model provides

a description for tables entities covering the minimum table structure description requirements
elements that helps in enriching tables with additional metadata (column labels, descriptions, table version, etc.)

Tabular Model is GWT friendly, which means that it can be used in GWT Web application on client side, since it's java beans are translatable into javascript code.

Tabular Data Dataset

Tabular Data Codelist

Tabular Data Metadata

Lineage

Tabular Data Template

Operations

Operation modules are a group of Java classes that provide, each one, a single functionality to the system. Functionality provided by operation modules may fall under one of these categories:

Data Import: a set of operations supporting the ingestion of datasets in the Tabular Data service;
Data Validation: a set of operations supporting ...
Data Transformation: a set of operations supporting data manipulation, e.g. filtering, aggregation, etc.
Data Export: a set of operations supporting the transfer of Tabular Data products in a format that can be used by other systems;

Each Operation takes an input, which is a set of parameters. These parameters may include a tabular data table or a column of a tabular data table or none of them (like in the import case). Along with additional parameters, each operation must belong to one of these categories:

Void scoped: does not require a table to compute
Table scoped: requires a target table to compute
Column scope: requires a target table column to compute

Each operation produce, as a result of its computation, a table and zero or more collateral tables. The create tables are always a new table probably created by first cloning the input table, if any is provided.

Operation modules leverages Cube Manager capabilities in order to create new tables, clone existing ones or modify the structure or additional metadata of tables. Operation modules can work with raw data directly on the DB, therefore data experts can rely on their SQL knowledge.

Data Import

Delivery State		Operation	Description
Service	Portlet	Operation	Description
Ready	Ready	Import dataset from CSV	This operation supports the ingestion of a dataset from a CSV file. What about Template?
?	?	Import dataset from SDMX	This operation supports the ingestion of a dataset from an SDMX compliant Repository. What about Template?
?	?	Import dataset from JSON	This operation supports the ingestion of a dataset from a JSON compliant data source. What about Template?
Ready	Ready	Import codelist from CSV	This operation supports the ingestion of a codelist from a CSV file. What about Template?
Ready	Ready	Import codelist from SDMX	This operation supports the ingestion of a codelist from an SDMX compliant Repository.
?	?	Import codelist from JSON	This operation supports the ingestion of a codelist from a JSON compliant data source.

Data Validation

Delivery State		Operation	Description
Service	Portlet	Operation	Description
Ready	?	Column data cast Check	This operation enables to verify whether the values of a dataset column can be safely transformed in a time data type.
Ready	?	Column type cast Check	This operation enables to verify whether the values of a dataset column can be safely transformed in a target data type.
Ready	?	Column values uniqueness Check	This operation enables to verify whether the values of a dataset column are unique or not.
Ready	?	Row values uniqueness Check	This operation enables to verify whether the rows of a dataset are unique or not with respect to a uniqueness condition.
?	?	Specify Dataset Template
?	?	Specify Validation Rule
Ready		Validate Codelist	This operation enables to assess if the codelist is compliant with ???
?	?	Validate Dataset	This operation enables to assess if the dataset is compliant with the dataset characterization and constraints specified in the associated template.
Ready		Validate Rule	This operation enables to assess if the dataset is compliant with the specific rule. ... what happen to the data that are not compliant? Is it possible to have triggers and actions?

Data Transformation

Dataset Structure-related Operations

Delivery State		Operation	Description
Service	Portlet	Operation	Description
		Add a column
?	?	Add a geo-referenced column	This operations enables the creation of a new column containing square grid reference identifiers (e.g. c-square codes) by using latitude and longitude column values.
Ready		Remove a column	This operation enables the removal of an entire column from a tabular data dataset.
Ready		Set Column Name	This operation enablse the specification of the column name.
Ready		Set Column Type to Annotation	This operation enables ...
Ready		Set Column Type to Attribute	This operation enables ...
Ready		Set Column Type to Codelist	This operation enables ...
Ready		Set Column Type to Codelist name	This operation enables ...
Ready		Set Column Type to Codelist description	This operation enables ...
Ready		Set Column Type to Dimension	This operation enables ...
Ready		Set Column Type to Measure	This operation enables ...
Ready		Set Table Name	This operation enables ...

Dataset Content-related Operations

Delivery State		Operation	Description
Service	Portlet	Operation	Description
?		Dataset Aggregation
?		Dataset Denormalisation
?		Dataset Filtering
?		Dataset Normalisation
Ready		Dataset Unfolding	was "create dataset view"
Ready		Remove Duplicate Rows	This operation supports the ...

Data Export

Delivery State		Operation	Description
Service	Portlet	Operation	Description
?	?	Export dataset to CSV	This operation enables the production of a CSV file out of a tabular data dataset.
?	?	Export dataset to SDMX	This operation enables the publication of a tabular data dataset in an SDMX Registry.
?	?	Export dataset to JSON	This operation enables the production of a JSON file out of a tabular data dataset.
Ready	Ready	Export codelist to CSV	This operation enables the production of a CSV file out of a tabular data codelist.
Ready	Ready	Export codelist to SDMX	This operation enables the publication of a tabular data codelist in an SDMX Registry.
?	?	Export codelist to JSON	This operation enables the production of a JSON file out of a codelist data dataset.

Expressions

Tabular-model provides a simple model for describing conditions on table data. Conditions can be expressed as a set of minimal constructs chained together with logical connectors. The data model used to compose expression relies on the composite design pattern which allows to build tree of expressions. Logical connectors are OR and AND and can take any number of child expression. Leaf expression are expression that describe a particular conditions. Each Leaf Expression can take an arbitrary set of parameters in order to be defined.

Starting from release 3.0.0 of tabular model a set of leaf expressions have been defined:

IsNull, Value is null;
ValueIsIn: Value contained in another column of another table;
Conditions on comparable values: Equals, Greater than, Lesser than, Not Equals;
Conditions on text values: contains text, text equals, text mathes SQL regexp.

An expression can be evaluated by an Evaluator, which is simply an object that, by processing a given expression, performs some action and returns a custom result. Two evaluators have been provided:

evaluator-sql: produces a SQL WHERE clause given an expression;
evaluator-description: produces a human readable description of the expression.

A client can obtain an instance of a Evaluator using its related EvaluatorFactory. Usage of expressions and evaluators is explained in the following code snippet taken from a test case of the evaluator-description project:

	DescriptionExpressionEvaluatorFactory evaluatorFactory = getDescriptionEvaluatorFactory();
 
	ColumnReference targetColumnReference = createColumnReference();
	ColumnReference anotherTargetColumnReference = createColumnReference();
 
	// Other expression
	Expression isNull = new IsNull(targetColumnReference);
	Expression columnIsIn = new ValueIsIn(targetColumnReference, anotherTargetColumnReference);
 
	// Text expression
	Expression textContains = new TextContains(createColumnReference(), new TDText("test"));
	Expression textEquals = new TextEquals(createColumnReference(), new TDText("test"));
	Expression textMatchRegexp = new TextMatchSQLRegexp(targetColumnReference, new TDText("[a-b]*"));
 
	// Comparable
	Expression equals = new Equals(targetColumnReference, new TDDate(new Date()));
	Expression greaterThan = new Equals(targetColumnReference, new TDInteger(5));
	Expression lessThan = new LessThan(targetColumnReference, new TDNumeric(5.1f));
	Expression notEquals = new NotEquals(targetColumnReference, new TDBoolean(false));
 
	// Composite
	Expression and = new And(textContains, isNull, columnIsIn, textContains);
	Expression or = new Or(and, textEquals, textMatchRegexp, equals, greaterThan, lessThan, notEquals);
 
	String description = evaluatorFactory.getEvaluator(or).evaluate();

LeafExpression are usually parameterized in terms of column references or typed values. Column references are objects that allow to reference a column of a tabular data table. Typed values must comply with allowed data types for table columns, therefore boxed custom types are used for the representation of those values.

Operators

Arithmetic Operators

+	Addition - Adds values on either side of the operator
-	Subtraction - Subtracts right hand operand from left hand operand
*	Multiplication - Multiplies values on either side of the operator
/	Division - Divides left hand operand by right hand operand
%	Modulus - Divides left hand operand by right hand operand and returns remainder

Comparison Operators

=	Checks if the values of two operands are equal or not, if yes then condition becomes true.
!=	Checks if the values of two operands are equal or not, if values are not equal then condition becomes true.
<>	Checks if the values of two operands are equal or not, if values are not equal then condition becomes true.
>	Checks if the value of left operand is greater than the value of right operand, if yes then condition becomes true.
<	Checks if the value of left operand is less than the value of right operand, if yes then condition becomes true.
>=	Checks if the value of left operand is greater than or equal to the value of right operand, if yes then condition becomes true.
<=	Checks if the value of left operand is less than or equal to the value of right operand, if yes then condition becomes true. (a <= b) is true.
!<	Checks if the value of left operand is not less than the value of right operand, if yes then condition becomes true.
!>	Checks if the value of left operand is not greater than the value of right operand, if yes then condition becomes true.

Logical Operators

ALL	The ALL operator is used to compare a value to all values in another value set.
AND	The AND operator allows the existence of multiple conditions in an expression.
ANY	The ANY operator is used to compare a value to any applicable value in the list according to the condition.
BETWEEN	The BETWEEN operator is used to search for values that are within a set of values, given the minimum value and the maximum value.
EXISTS	The EXISTS operator is used to search for the presence of a row in a specified dataset that meets certain criteria.
IN	The IN operator is used to compare a value to a list of literal values that have been specified.
LIKE	The LIKE operator is used to compare a value to similar values using wildcard operators.
NOT	The NOT operator reverses the meaning of the logical operator with which it is used.
OR	The OR operator is used to combine multiple expressions.
IS NULL	The NULL operator is used to compare a value with a NULL value.
UNIQUE	The UNIQUE operator searches every row of a specified dataset for uniqueness (no duplicates).

Tabular Data Service Model and Operations

Contents

Data Model

Tabular Data Dataset

Tabular Data Codelist

Tabular Data Metadata

Lineage

Tabular Data Template

Operations

Data Import

Data Validation

Data Transformation

Dataset Structure-related Operations

Dataset Content-related Operations

Data Export

Expressions

Operators

Arithmetic Operators

Comparison Operators

Logical Operators

Navigation menu

Views

Personal tools

gCube Wiki

gCube features

gCube documentation

Integration and Distribution

Search

Tools