Difference between revisions of "GCube SDMX Statistical Data Dissemination System"
(→Developer libraries) |
|||
Line 1: | Line 1: | ||
+ | <div style="float:right;">__TOC__</div> | ||
+ | |||
SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. | SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. | ||
SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata. | SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata. |
Revision as of 17:41, 21 December 2012
SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata.
Fundamental definitions
Data set: Statistical Data is usually organized into discrete sets, which include particular observations for a specific period of time. A data set can be understood as collection of similar data, sharing a structure, which covers a fixed period of time.
Architecture
SDMX architecture comprises several components:
- SDMX Registry: The SDMX registry is a service that holds SDMX Structural Metadata Artifacts and provides SDMX SOAP and REST interfaces for retrieving and registering those artifacts. It also works as a reverse proxy between an SDMX client and the SDMX Datasource.
- gCube SDMX Datasource: A generic SDMX Datasource is a service that is mainly responsible for providing access to statistical data represented in SDMX documents. Statistical data is usually stored using representation formats different than SDMX and potentially leveraging a wide array of storage backends. SDMX Datasource abstracts those aspects by providing means of accessing data backends. Furthermore it transforms non-SDMX data formats into one of the standard SDMX formats. Standing to SDMX specifications an SDMX Datasource can be a service as simple as a web server or a more complex one, delivering SDMX standard REST or SOAP interfaces. gCube SDMX Datasource specifically provides access to gCube Timeseries, available through Timeseries Service.
- Timeseries Service: It is a SOAP service that allows the retrieval of timeseries.
Use Cases
Publication of Timeseries data
In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the Registry, describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:
- Register the Agency providing the SDMX artifacts on the Root Agency Scheme: the Agency provides information on the organization that is delivering Structural Metadata.
- Register one or more Codelists: Each codelist enumerates a set of values to be used in the representation of dimensions, attributes, and other structural parts of SDMX. For example a codelist may define how to identify a set of countries.
- Register one or more Concept Scheme: A concept scheme is a maintained list of concepts that are used in data structure definitions and metadata structure definitions. For example a concept scheme may define the concept of country.
- Register a Data Structure Definition (DSD): Each data set has a set of structural metadata. These descriptions are referred to in SDMX as Data Structure Definitions, which include information about how concepts are associated with the measures, dimensions, and attributes of a data “cube,” along with information about the representation of data and related identifying and descriptive (structural) metadata.
- Register a Dataflow: a Dataflow definition describes a statistical data set exchange where multiple data providers can be involved by describing the related DSD and constraints which applies to the dataflow (like reporting periodicity).
- Register a Data Provider within the Agency DataProviderScheme: Each actor providing data is identified as a Data Provider. A Data Provider Agency Scheme is an Organisation Scheme (like the Agency Scheme), supplied by an agency, that describes a set of Data Providers.
- Register a Provision Agreement: The set of information which describes the way in which data sets and metadata sets are provided by a data provider. The term “agreement” is used because this information can be understood as the basis of a “service-level agreement”.
The information provided to the Registry with these steps completely describes who is providing statistical data and which domain the data is going to cover. In order to let the Registry know where the data published by a data provider can be retrieved the Timeseries service must first register a timeseries on the SDMX Datasource, opening access to a particular timeseries data set to it. This step involves the invocation of gCube specific web service method made available by the SDMX Datasource. Once the timeseries is registered on the SDMX Datasource a datasource registration can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.
sdmx-publisher library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.
Retrieval of SDMX Structural Metadata or Data
Retrieval of data sets
After publication of timeseries data, SDMX data set documents can be easily retrieved by querying the SDMX Registry, which should act as a proxy between the client and the SDMX datasource. The reverse-proxy role of the Registry is justified by the fact that a given dataflow can be supported by data provided by different datasources. The request is then forwarded to the SDMX Datasource, that provide the same Data Query REST interface of the Registry. The datasource parses the query, retrieves the data querying a timeseries and returns an SDMX document filled with the timeseries data set. The latter is then forwarded to the SDMX client as a response of the original query.
Retrieval of Structural Metadata
Structural Metadata is hosted on the SDMX Registry. Therefore this use case involves only a simple SDMX Structural Metadata query from the SDMX client to the Registry, which in time returns the data as requested.
Developer libraries
Several libraries are provided to the developers in order to access SDMX Registry and Datasource functionalities:
- sdmx-registry-client and sdmx-registry-client-gcube
- sdmx-datasource-client
- sdmx-publisher
Since those libraries works with sdmxsource, which heavily relies on Spring Dependency Injenction framework autowiring capabilities, the client must define a spring beans configuration file. Configuration file examples are provided for each library
sdmx-registry-client
sdmx-registry-client is a client library for accessing Registry services. The implementation provided allows to query a subset of the REST interface methods of Metadata Technology Fusion Registry.
sdmx-registry-client-gcube is a support library that allows the automatic retrieval of a scope related SDMX Registry instance.
Maven artifacts
Find sdmx-registry-client on Nexus Repository Browser for an artifact with the following coordinates:
<groupId>org.gcube.datapublishing</groupId> <artifactId>sdmx-registry-client</artifactId>
Find the latest version of sdmx-registry-client-gcube on Nexus Repository Browser by looking for an artifact with the following coordinates:
<groupId>org.gcube.datapublishing</groupId> <artifactId>sdmx-registry-client-gcube</artifactId>
Quickstart
Once the library jar is on the classpath the client must obtain an implementation of SDMXRegistryClient by leveraging Spring Injection.
Example:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd"> <context:spring-configured /> <context:annotation-config /> <context:component-scan base-package="org.sdmxsource" /> <bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.SDMXRegistryDescriptorImpl"> <property name="rest_url_V1" value="http://localhost:8080/FusionRegistry/ws/rest/"/> <property name="rest_url_V2" value="http://localhost:8080/FusionRegistry/ws/rest/"/> <property name="rest_url_V2_1" value="http://localhost:8080/FusionRegistry/ws/rest/"/> </bean> <bean id="registryClient" class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient"> <constructor-arg name="interfaceType" value="RESTV2_1" /> <constructor-arg name="SDMXRegistry" ref="registryDescriptor"/> </bean>
Here the registry client is injected with a registry descriptor providing all of the REST registry endpoints. SOAP endpoints could also be provided. The client is also initialized with an interfaceType parameter that instruct the client to query the registry using SDMX REST v2.1 message format.
If the client application wants to leverage sdmx-registry-client-gcube use this spring beans configuration file:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd"> <context:spring-configured /> <context:annotation-config /> <context:component-scan base-package="org.sdmxsource" /> <bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXRegistryDescriptor"/> <bean id="registryClient" class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient"> <constructor-arg name="interfaceType" value="RESTV2_1"/> <constructor-arg name="registry" ref="registryDescriptor"/> </bean> </beans>
Here the registryDescriptor is simply an instance of GCubeSDMXRegistryDescriptor class which is responsible for retrieving the SDMX Registry endpoints at runtime by querying the IS using the scope provided by 'ScopeProvider'
Important: in order to use sdmx-registry-client-gcube the library client must set the right scope using ScopeProvider facilities. Example:
ScopeProvider.instance.set("/gcube/devsec");
Supposing that the beans configuration file is named "applicationContext.xml" and placed on the classpath, an instance of SDMXRegistryClient can be obtained with the following lines of code:
ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); client = context.getBean(SDMXRegistryClient.class);
API documentation
TODO
sdmx-datasource-client
Maven artifacts
Find sdmx-datasource-client on Nexus Repository Browser for an artifact with the following coordinates:
<groupId>org.gcube.datapublishing</groupId> <artifactId>sdmx-datasource-client</artifactId>
Quickstart
Spring Dependency Injection can be set using the following configuration:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd"> <context:spring-configured /> <context:annotation-config /> <context:component-scan base-package="org.sdmxsource" /> <bean id="datasourceDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXDatasourceDescriptorIS" /> <bean id="datasourceClient" class="org.gcube.datapublishing.sdmx.impl.datasource.GCubeSDMXDatasourceClientImpl" /> </beans>
The datasourceDescriptor refers to the java class GCubeSDMXDatasourceDescriptorIS which is responsible for retrieving the Datasource endpoints at runtime by querying the IS using the scope set with ScopeProvider facilities.
API documentation
TODO
sdmx-publisher
sdmx-publisher wraps sdmx-registry-client and sdmx-datasource-client providing easier means for publishing gCube timeseries dataset related SDMX artifacts both on SDMX Registry and gCube SDMX Datasource.
Maven artifacts
Find sdmx-publisher on Nexus Repository Browser for an artifact with the following coordinates:
<groupId>org.gcube.datapublishing</groupId> <artifactId>sdmx-publisher</artifactId>
Quickstart
Spring injection (with the usage of sdmx-registry-client-gcube facility) can be set using this beans configuration file:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd"> <context:spring-configured /> <context:annotation-config /> <context:component-scan base-package="org.sdmxsource" /> <bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXRegistryDescriptor" /> <bean id="datasourceDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXDatasourceDescriptorIS" /> <bean id="registryClient" class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient"> <constructor-arg name="interfaceType" value="RESTV2_1" /> </bean> <bean id="datasourceClient" class="org.gcube.datapublishing.sdmx.impl.datasource.GCubeSDMXDatasourceClientImpl" /> <bean id="publisherClient" class="org.gcube.datapublishing.sdmx.impl.publisher.GCubeSDMXPublisherImpl" /> </beans>
As you can see, both registry and datasource clients are injected with descriptors that retrieve service endpoints from the IS.
An instance of SDMXPublisher can be obtained with:
ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); publisher = context.getBean(GCubeSDMXPublisher.class);
Important: remember to set the right scope using ScopeProvider facilities. Example:
ScopeProvider.instance.set("/gcube/devsec");
API documentation
TODO