Difference between revisions of "GCube SDMX Statistical Data Dissemination System"

From Gcube Wiki
Jump to: navigation, search
m (Usage)
m (Usage)
Line 172: Line 172:
 
ApplicationContext context = new ClassPathXmlApplicationContext(
 
ApplicationContext context = new ClassPathXmlApplicationContext(
 
         "applicationContext-test.xml");
 
         "applicationContext-test.xml");
SDMXRegistryClient = context.getBean(SDMXRegistryClient.class);
+
SDMXRegistryClient client = context.getBean(SDMXRegistryClient.class);
  
 
/** Create agency scheme **/
 
/** Create agency scheme **/

Revision as of 09:52, 24 April 2013

SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata.

Fundamental definitions

Data set: Statistical Data is usually organized into discrete sets, which include particular observations for a specific period of time. A data set can be understood as collection of similar data, sharing a structure, which covers a fixed period of time.

Architecture

SDMX architecture comprises several components:

  • SDMX Registry: The SDMX registry is a service that holds SDMX Structural Metadata Artifacts and provides SDMX SOAP and REST interfaces for retrieving and registering those artifacts. It also works as a reverse proxy between an SDMX client and the SDMX Datasource.
  • gCube SDMX Datasource: A generic SDMX Datasource is a service that is mainly responsible for providing access to statistical data represented in SDMX documents. Statistical data is usually stored using representation formats different than SDMX and potentially leveraging a wide array of storage backends. SDMX Datasource abstracts those aspects by providing means of accessing data backends. Furthermore it transforms non-SDMX data formats into one of the standard SDMX formats. Standing to SDMX specifications an SDMX Datasource can be a service as simple as a web server or a more complex one, delivering SDMX standard REST or SOAP interfaces. gCube SDMX Datasource specifically provides access to gCube Timeseries, available through Timeseries Service.
  • Timeseries Service: It is a SOAP service that allows the retrieval of timeseries.

Use Cases

Publication of Timeseries data

Publication of timeseries data

In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the Registry, describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:

  • Register the Agency providing the SDMX artifacts on the Root Agency Scheme: the Agency provides information on the organization that is delivering Structural Metadata.
  • Register one or more Codelists: Each codelist enumerates a set of values to be used in the representation of dimensions, attributes, and other structural parts of SDMX. For example a codelist may define how to identify a set of countries.
  • Register one or more Concept Scheme: A concept scheme is a maintained list of concepts that are used in data structure definitions and metadata structure definitions. For example a concept scheme may define the concept of country.
  • Register a Data Structure Definition (DSD): Each data set has a set of structural metadata. These descriptions are referred to in SDMX as Data Structure Definitions, which include information about how concepts are associated with the measures, dimensions, and attributes of a data “cube,” along with information about the representation of data and related identifying and descriptive (structural) metadata.
  • Register a Dataflow: a Dataflow definition describes a statistical data set exchange where multiple data providers can be involved by describing the related DSD and constraints which applies to the dataflow (like reporting periodicity).
  • Register a Data Provider within the Agency DataProviderScheme: Each actor providing data is identified as a Data Provider. A Data Provider Agency Scheme is an Organisation Scheme (like the Agency Scheme), supplied by an agency, that describes a set of Data Providers.
  • Register a Provision Agreement: The set of information which describes the way in which data sets and metadata sets are provided by a data provider. The term “agreement” is used because this information can be understood as the basis of a “service-level agreement”.

The information provided to the Registry with these steps completely describes who is providing statistical data and which domain the data is going to cover. In order to let the Registry know where the data published by a data provider can be retrieved the Timeseries service must first register a timeseries on the SDMX Datasource, opening access to a particular timeseries data set to it. This step involves the invocation of gCube specific web service method made available by the SDMX Datasource. Once the timeseries is registered on the SDMX Datasource a datasource registration can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.

sdmx-publisher library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.

Retrieval of SDMX Structural Metadata or Data

SDMX data set retrieval

Retrieval of data sets

After publication of timeseries data, SDMX data set documents can be easily retrieved by querying the SDMX Registry, which should act as a proxy between the client and the SDMX datasource. The reverse-proxy role of the Registry is justified by the fact that a given dataflow can be supported by data provided by different datasources. The request is then forwarded to the SDMX Datasource, that provide the same Data Query REST interface of the Registry. The datasource parses the query, retrieves the data querying a timeseries and returns an SDMX document filled with the timeseries data set. The latter is then forwarded to the SDMX client as a response of the original query.

Retrieval of Structural Metadata

Structural Metadata is hosted on the SDMX Registry. Therefore this use case involves only a simple SDMX Structural Metadata query from the SDMX client to the Registry, which in time returns the data as requested.

Developer libraries

Several libraries are provided to the developers in order to access SDMX Registry and Datasource functionalities:

  • sdmx-registry-client and sdmx-registry-client-gcube
  • sdmx-datasource-client
  • sdmx-publisher

Since those libraries works with sdmxsource, which heavily relies on Spring Dependency Injenction framework autowiring capabilities, the client must define a spring beans configuration file. Configuration file examples are provided for each library

sdmx-registry-client

sdmx-registry-client is a client library for accessing Registry services. The implementation provided allows to query a subset of the REST interface methods of Metadata Technology Fusion Registry.

sdmx-registry-client-gcube is a support library that allows the automatic retrieval of a scope related SDMX Registry instance.

Maven artifacts

Find sdmx-registry-client on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client</artifactId>

Find the latest version of sdmx-registry-client-gcube on Nexus Repository Browser by looking for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client-gcube</artifactId>

Quickstart

Once the library jar is on the classpath the client must obtain an implementation of SDMXRegistryClient by leveraging Spring Injection.

Example:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
	<context:spring-configured />
	<context:annotation-config />
	
	<context:component-scan base-package="org.sdmxsource" />
	
	<bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.SDMXRegistryDescriptorImpl">
		<property name="rest_url_V1" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
		<property name="rest_url_V2" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
		<property name="rest_url_V2_1" value="http://localhost:8080/FusionRegistry/ws/rest/"/>
	</bean>
	
	<bean id="registryClient"
		class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient">
		<constructor-arg name="interfaceType" value="RESTV2_1" />
		<constructor-arg name="SDMXRegistry" ref="registryDescriptor"/>
	</bean>
	

Here the registry client is injected with a registry descriptor providing all of the REST registry endpoints. SOAP endpoints could also be provided. The client is also initialized with an interfaceType parameter that instruct the client to query the registry using SDMX REST v2.1 message format.

If the client application wants to leverage sdmx-registry-client-gcube use this spring beans configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
	<context:spring-configured />
	<context:annotation-config />
	
	<context:component-scan base-package="org.sdmxsource" />
	
	<bean id="registryDescriptor" class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXRegistryDescriptor"/>
	
	<bean id="registryClient" class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient">
		<constructor-arg name="interfaceType" value="RESTV2_1"/>
		<constructor-arg name="registry" ref="registryDescriptor"/>
	</bean>	

</beans>

Here the registryDescriptor is simply an instance of GCubeSDMXRegistryDescriptor class which is responsible for retrieving the SDMX Registry endpoints at runtime by querying the IS using the scope provided by 'ScopeProvider'

Important: in order to use sdmx-registry-client-gcube the library client must set the right scope using ScopeProvider facilities. Example:

ScopeProvider.instance.set("/gcube/devsec");

Supposing that the beans configuration file is named "applicationContext.xml" and placed on the classpath, an instance of SDMXRegistryClient can be obtained with the following lines of code:

ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml");
SDMXRegistryClient client = context.getBean(SDMXRegistryClient.class);

Usage

sdmx-registry-client provides methods for publishing/retrieving SMDX Structural Metadata to/from the repository. The artifacts that are supported are:

  • AgencyScheme
  • Codelist
  • ConceptScheme
  • DataStructure (DSD)
  • Dataflow
  • DataProviderScheme
  • ProvisionAgreement
  • Registration

In order to publish data the client must create a sdmxsource bean and fill it with data. SDMXSource beans are mainly divided into two categories:

  • Mutable beans: Objects whose attributes values can be modified
  • Immutable bean: Objects whose attribute values cannot be modified

The client must first create mutable beans and then obtain the related immutable instance by using the method getImmutableInstance().

Once the bean is ready to be sent, the client can invoke SDMXRegistry.publish() method to send the data to the registry. A SubmissionReport containing the status of the operation and important information about the error in case of failure.

In the following snippet of code a client creates an agency scheme (root agency scheme) and a codelist and publish them to the registry:

ApplicationContext context = new ClassPathXmlApplicationContext(
        "applicationContext-test.xml");
SDMXRegistryClient client = context.getBean(SDMXRegistryClient.class);

/** Create agency scheme **/
// Create Agency Scheme from scratch
AgencySchemeMutableBean agencyScheme = new AgencySchemeMutableBeanImpl();
agencyScheme.setId("AGENCIES");
agencyScheme.setAgencyId("SDMX");
agencyScheme.setVersion("1.0");
agencyScheme.addName("en", "SDMX Agency Scheme");
// ROOT Agency
AgencyMutableBean sdmxAgency = new AgencyMutableBeanImpl();
sdmxAgency.setId("SDMX");
sdmxAgency.addName("en", "SDMX");
agencyScheme.addItem(sdmxAgency);
// FAO Agency
AgencyMutableBean myAgency = new AgencyMutableBeanImpl();
myAgency.setId("MYAGENCY");
myAgency.addName("en", "My Agency");
agencyScheme.addItem(myAgency);

/** Create codelist with two codes **/
CodelistMutableBean codelist = new CodelistMutableBeanImpl();
codelist.setAgencyId("MYAGENCY");
codelist.setId("TEST_CODELIST");
codelist.setVersion("1.0");
codelist.addName("en", "Test codelist");

//First code
CodeMutableBean code1 = new CodeMutableBeanImpl();
code1.addName("en", "Test code");
code1.setId("TEST_CODE_1");
code1.addDescription("en", "Test description");
AnnotationMutableBean annotation = new AnnotationMutableBeanImpl();
annotation.setTitle("Annotation title");
annotation.addText("en", "Annotation text");
code1.addAnnotation(annotation);

//Second code
CodeMutableBean code2 = new CodeMutableBeanImpl();
code2 = new CodeMutableBeanImpl();
code2.addName("en", "Test code");
code2.setId("TEST_CODE_2");
code2.addDescription("en", "Test description");
code2.addDescription("it", "Descrizione di test");

codelist.addItem(code1);
codelist.addItem(code2);

/** Publish artifacts **/
try {
	client.publish(agencyScheme.getImmutableInstance());
} catch (SDMXRegistryClientException e) {
	// Handle SDMXRegistryClientException
	e.printStackTrace();
}
try {
	client.publish(codelist.getImmutableInstance());
} catch (SDMXRegistryClientException e) {
	// Handle SDMXRegistryClientException
	e.printStackTrace();
}

The root agency scheme lists all top level agencies.

SDMXRegistryClientExceptions may return when querying the registry. The exception provides useful info on the error encountered.

Full example regarding the publication of artifacts can be found in the sources package (tests). Check Nexus for the latest version of the artifact.

sdmx-datasource-client

Maven artifacts

Find sdmx-datasource-client on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-datasource-client</artifactId>

Quickstart

Spring Dependency Injection can be set using the following configuration:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
	
        <context:spring-configured />
	<context:annotation-config />

	<context:component-scan base-package="org.sdmxsource" />
	
	<bean id="datasourceDescriptor"
		class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXDatasourceDescriptorIS" />
	
	<bean id="datasourceClient"
		class="org.gcube.datapublishing.sdmx.impl.datasource.GCubeSDMXDatasourceClientImpl" />

</beans>

The datasourceDescriptor refers to the java class GCubeSDMXDatasourceDescriptorIS which is responsible for retrieving the Datasource endpoints at runtime by querying the IS using the scope set with ScopeProvider facilities.

Usage

sdmx-datasource can be used in order to:

  • register timeseries data available through a timeseries service
  • query a datasource for sdmx data document

In order to register a timeseries the client must provide:

  • the Agency Id related to the dataflow being provided
  • the Dataflow Id of the dataflow being provided
  • the version of the Dataflow being provided
  • the Agency Id of the agency declaring the Provider Agency
  • the Provider Agency's Agency Id
  • a timeseries service scope
  • the timeseries id
  • the registry scope

TimeseriesRegistrations can also be queried or removed from the datasource.

In order to query the datasource for data the client must provide:

  • An ID identifying the dataflow of interest
  • a series of code values identifying the dimension of interest
  • an ID identifying the provider agency
  • an integer identifying the max number of observations requested
  • an enumerated value which allows filtering of the returned data. Admitted values:
    • full(default): all data and documentation, including annotations
    • dataonly: attributes and groups exluded
    • serieskeyonly: returns only the series elements and the dimensions that make up the series keys. This is useful for performance reasons, to return the series that match a certain query, without returning the actual data.
    • nodata: returns the groups and series, including attributes and annotations, without observations.
  • an enumerated value representing the requested sdmx document version

Examples regarding those use cases can be found in sources package (tests). Check nexus for the lates version of the artifact.

sdmx-publisher

sdmx-publisher wraps sdmx-registry-client and sdmx-datasource-client providing easier means for publishing gCube timeseries dataset related SDMX artifacts both on SDMX Registry and gCube SDMX Datasource.

Maven artifacts

Find sdmx-publisher on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-publisher</artifactId>

Quickstart

Spring injection (with the usage of sdmx-registry-client-gcube facility) can be set using this beans configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">

	<context:spring-configured />
	<context:annotation-config />

	<context:component-scan base-package="org.sdmxsource" />

	<bean id="registryDescriptor"
		class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXRegistryDescriptor" />
	<bean id="datasourceDescriptor"
		class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXDatasourceDescriptorIS" />

	<bean id="registryClient"
		class="org.gcube.datapublishing.sdmx.impl.registry.FusionRegistryClient">
		<constructor-arg name="interfaceType" value="RESTV2_1" />
	</bean>

	<bean id="datasourceClient"
		class="org.gcube.datapublishing.sdmx.impl.datasource.GCubeSDMXDatasourceClientImpl" />

	<bean id="publisherClient"
		class="org.gcube.datapublishing.sdmx.impl.publisher.GCubeSDMXPublisherImpl" />

</beans>

As you can see, both registry and datasource clients are injected with descriptors that retrieve service endpoints from the IS.

An instance of SDMXPublisher can be obtained with:

ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml");
publisher = context.getBean(GCubeSDMXPublisher.class);

Important: remember to set the right scope using ScopeProvider facilities. Example:

ScopeProvider.instance.set("/gcube/devsec");

Usage

sdmx-publisher allows a client to easily estabilish sdmx data exchange based on available timeseries data.

The operations to be done and their relative order are summarized in the folowing list:

  1. Publish a top level agency providing the Structural Metadata
  2. Publish one or more codelists
  3. Publish one or more concept schemes
  4. Publish a datastructure definition, describing the data structure for the dataset which is going to be provided
  5. Publish a dataflow, identifying a dataset instance and the datastructure definition that is going to be used
  6. Publish a Data Provider Scheme containing information about the data provider providing the data
  7. Publish a Provision Agreement which tells which Provider is going to participate for the dataflow
  8. Publish a Registration. This operation instructs both the datasource and the registry about the availability of data.

A full example where a client estabilishes a sdmx data exchange can be found in the sources package of sdmx-package (tests). Check Nexus for the latest version of the artifact.