Difference between revisions of "GCube SDMX Statistical Data Dissemination System"

From Gcube Wiki
Jump to: navigation, search
(Maven coordinates)
m (sdmx-publisher)
Line 151: Line 151:
  
 
===sdmx-publisher===
 
===sdmx-publisher===
TODO
+
 
* Maven dep
+
''sdmx-publisher'' wraps ''sdmx-registry-client'' and ''sdmx-datasource-client'' providing easier means for publishing gCube timeseries dataset related SDMX artifacts both on SDMX Registry and gCube SDMX Datasource.
* Spring injection
+
 
* API documentation
+
====Maven artifacts====
 +
 
 +
''sdmx-publisher''
 +
 
 +
====Quickstart====

Revision as of 17:14, 21 December 2012

SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata.

Fundamental definitions

Data set: Statistical Data is usually organized into discrete sets, which include particular observations for a specific period of time. A data set can be understood as collection of similar data, sharing a structure, which covers a fixed period of time.

Architecture

SDMX architecture comprises several components:

  • SDMX Registry: The SDMX registry is a service that holds SDMX Structural Metadata Artifacts and provides SDMX SOAP and REST interfaces for retrieving and registering those artifacts. It also works as a reverse proxy between an SDMX client and the SDMX Datasource.
  • gCube SDMX Datasource: A generic SDMX Datasource is a service that is mainly responsible for providing access to statistical data represented in SDMX documents. Statistical data is usually stored using representation formats different than SDMX and potentially leveraging a wide array of storage backends. SDMX Datasource abstracts those aspects by providing means of accessing data backends. Furthermore it transforms non-SDMX data formats into one of the standard SDMX formats. Standing to SDMX specifications an SDMX Datasource can be a service as simple as a web server or a more complex one, delivering SDMX standard REST or SOAP interfaces. gCube SDMX Datasource specifically provides access to gCube Timeseries, available through Timeseries Service.
  • Timeseries Service: It is a SOAP service that allows the retrieval of timeseries.

Use Cases

Publication of Timeseries data

Publication of timeseries data

In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the Registry, describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:

  • Register the Agency providing the SDMX artifacts on the Root Agency Scheme: the Agency provides information on the organization that is delivering Structural Metadata.
  • Register one or more Codelists: Each codelist enumerates a set of values to be used in the representation of dimensions, attributes, and other structural parts of SDMX. For example a codelist may define how to identify a set of countries.
  • Register one or more Concept Scheme: A concept scheme is a maintained list of concepts that are used in data structure definitions and metadata structure definitions. For example a concept scheme may define the concept of country.
  • Register a Data Structure Definition (DSD): Each data set has a set of structural metadata. These descriptions are referred to in SDMX as Data Structure Definitions, which include information about how concepts are associated with the measures, dimensions, and attributes of a data “cube,” along with information about the representation of data and related identifying and descriptive (structural) metadata.
  • Register a Dataflow: a Dataflow definition describes a statistical data set exchange where multiple data providers can be involved by describing the related DSD and constraints which applies to the dataflow (like reporting periodicity).
  • Register a Data Provider within the Agency DataProviderScheme: Each actor providing data is identified as a Data Provider. A Data Provider Agency Scheme is an Organisation Scheme (like the Agency Scheme), supplied by an agency, that describes a set of Data Providers.
  • Register a Provision Agreement: The set of information which describes the way in which data sets and metadata sets are provided by a data provider. The term “agreement” is used because this information can be understood as the basis of a “service-level agreement”.

The information provided to the Registry with these steps completely describes who is providing statistical data and which domain the data is going to cover. In order to let the Registry know where the data published by a data provider can be retrieved the Timeseries service must first register a timeseries on the SDMX Datasource, opening access to a particular timeseries data set to it. This step involves the invocation of gCube specific web service method made available by the SDMX Datasource. Once the timeseries is registered on the SDMX Datasource a datasource registration can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.

sdmx-publisher library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.

Retrieval of SDMX Structural Metadata or Data

SDMX data set retrieval

Retrieval of data sets

After publication of timeseries data, SDMX data set documents can be easily retrieved by querying the SDMX Registry, which should act as a proxy between the client and the SDMX datasource. The reverse-proxy role of the Registry is justified by the fact that a given dataflow can be supported by data provided by different datasources. The request is then forwarded to the SDMX Datasource, that provide the same Data Query REST interface of the Registry. The datasource parses the query, retrieves the data querying a timeseries and returns an SDMX document filled with the timeseries data set. The latter is then forwarded to the SDMX client as a response of the original query.

Retrieval of Structural Metadata

Structural Metadata is hosted on the SDMX Registry. Therefore this use case involves only a simple SDMX Structural Metadata query from the SDMX client to the Registry, which in time returns the data as requested.

Developer libraries

Several libraries are provided to the developers in order to access SDMX Registry and Datasource functionalities:

  • sdmx-registry-client and sdmx-registry-client-gcube
  • sdmx-datasource-client
  • sdmx-publisher

Since those libraries works with sdmxsource, which relies on Spring Dependency Injenction framework, the client must define a spring beans configuration file. Configuration file examples are provided for each library

sdmx-registry-client

sdmx-registry-client is a client library for accessing Registry services. The implementation provided allows to query a subset of the REST interface methods of Metadata Technology Fusion Registry.

sdmx-registry-client-gcube is a support library that allows the automatic retrieval of a scope related SDMX Registry instance.

Maven artifacts

Find sdmx-registry-client on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client</artifactId>

Find the latest version of sdmx-registry-client-gcube on Nexus Repository Browser by looking for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client-gcube</artifactId>

Quickstart

Once the library jar is on the classpath the client must obtain an implementation of SDMXRegistryClient by leveraging Spring Injection.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
	<context:spring-configured />
	<context:annotation-config />
	
	<context:component-scan base-package="org.sdmxsource" />
	
	<bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.SDMXRegistryDescriptorImpl">
		<property name="rest_url_V1" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
		<property name="rest_url_V2" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
		<property name="rest_url_V2_1" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
	</bean>
	
	<bean id="registryClient"
		class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient">
		<constructor-arg name="interfaceType" value="RESTV2_1" />
		<constructor-arg name="SDMXRegistry" ref="registryDescriptor"/>
	</bean>
	

Here the registry client is injected with a registry descriptor providing all of the REST registry endpoints. SOAP endpoints could also be provided. The client is also initialized with an interfaceType parameter that instruct the client to query the registry using SDMX REST v2.1 message format.


If the client application wants to leverage sdmx-registry-client-gcube use this spring beans configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
	<context:spring-configured />
	<context:annotation-config />
	
	<context:component-scan base-package="org.sdmxsource" />
	
	<bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXRegistryDescriptor"/>
	
	<bean id="registryClient" class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient">
		<constructor-arg name="interfaceType" value="RESTV2_1"/>
		<constructor-arg name="registry" ref="registryDescriptor"/>
	</bean>	

</beans>

Here the registryDescriptor is simply an instance of GCubeSDMXRegistryDescriptor class which is responsible for retrieving the SDMX Registry endpoints at runtime by querying the IS using the scope provided by 'ScopeProvider'

Important: in order to use sdmx-registry-client-gcube the library client must set the right scope using ScopeProvider facilities

Supposing that the beans configuration file is named "applicationContext.xml" and placed on the classpath, an instance of SDMXRegistryClient can be obtained with the following lines of code:

ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml");
client = context.getBean(SDMXRegistryClient.class);

sdmx-datasource-client

Maven artifacts

Quickstart

sdmx-publisher

sdmx-publisher wraps sdmx-registry-client and sdmx-datasource-client providing easier means for publishing gCube timeseries dataset related SDMX artifacts both on SDMX Registry and gCube SDMX Datasource.

Maven artifacts

sdmx-publisher

Quickstart