GCube SDMX Statistical Data Dissemination System

From Gcube Wiki
Jump to: navigation, search

SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata.

Fundamental definitions

Data set: Statistical Data is usually organized into discrete sets, which include particular observations for a specific period of time. A data set can be understood as collection of similar data, sharing a structure, which covers a fixed period of time.

Architecture

SDMX architecture comprises several components:

  • SDMX Registry: The SDMX registry is a service that holds SDMX Structural Metadata Artifacts and provides SDMX SOAP and REST interfaces for retrieving and registering those artifacts. It also works as a reverse proxy between an SDMX client and the SDMX Datasource.
  • gCube SDMX Datasource: A generic SDMX Datasource is a service that is mainly responsible for providing access to statistical data represented in SDMX documents. Statistical data is usually stored using representation formats different than SDMX and potentially leveraging a wide array of storage backends. SDMX Datasource abstracts those aspects by providing means of accessing data backends. Furthermore it transforms non-SDMX data formats into one of the standard SDMX formats. Standing to SDMX specifications an SDMX Datasource can be a service as simple as a web server or a more complex one, delivering SDMX standard REST or SOAP interfaces. gCube SDMX Datasource specifically provides access to gCube Timeseries, available through Timeseries Service.
  • Timeseries Service: It is a SOAP service that allows the retrieval of timeseries.

Use Cases

Publication of Timeseries data

Publication of timeseries data

In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the Registry, describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:

  • Register the Agency providing the SDMX artifacts on the Root Agency Scheme: the Agency provides information on the organization that is delivering Structural Metadata.
  • Register one or more Codelists: Each codelist enumerates a set of values to be used in the representation of dimensions, attributes, and other structural parts of SDMX. For example a codelist may define how to identify a set of countries.
  • Register one or more Concept Scheme: A concept scheme is a maintained list of concepts that are used in data structure definitions and metadata structure definitions. For example a concept scheme may define the concept of country.
  • Register a Data Structure Definition (DSD): Each data set has a set of structural metadata. These descriptions are referred to in SDMX as Data Structure Definitions, which include information about how concepts are associated with the measures, dimensions, and attributes of a data “cube,” along with information about the representation of data and related identifying and descriptive (structural) metadata.
  • Register a Dataflow: a Dataflow definition describes a statistical data set exchange where multiple data providers can be involved by describing the related DSD and constraints which applies to the dataflow (like reporting periodicity).
  • Register a Data Provider within the Agency DataProviderScheme: Each actor providing data is identified as a Data Provider. A Data Provider Agency Scheme is an Organisation Scheme (like the Agency Scheme), supplied by an agency, that describes a set of Data Providers.
  • Register a Provision Agreement: The set of information which describes the way in which data sets and metadata sets are provided by a data provider. The term “agreement” is used because this information can be understood as the basis of a “service-level agreement”.

The information provided to the Registry with these steps completely describes who is providing statistical data and which domain the data is going to cover. In order to let the Registry know where the data published by a data provider can be retrieved, a component holding the statistical data must first publish the dataset on a SDMX Datasource. This step involves the invocation of a non sdmx standard interface. Once the statistical data is registered on the SDMX Datasource a datasource registration can be safely published on the registry, holding the endpoint through which SDMX Datasource service can be accessed.

sdmx-registry-client allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.

Retrieval of SDMX Structural Metadata or Data

SDMX data set retrieval

Retrieval of Structural Metadata

Structural Metadata is hosted on the SDMX Registry. In order to retrieve those documents a client can use the sdmx-registry-client. Currently supported structural metadata are:

  • Agency Scheme
  • Codelist
  • Concept scheme
  • Datastructure Definition (DSD)
  • Dataflow
  • DataProviderScheme
  • Provision Agreement
  • Datasource registration

Retrieval of data sets

After publication of timeseries data, SDMX data set documents can be easily retrieved by querying the SDMX Registry, which should act as a proxy between the client and the SDMX datasource. The reverse-proxy role of the Registry is justified by the fact that a given dataflow can be supported by data provided by different datasources. The request is then forwarded to the SDMX Datasource, which provide the same Data Query REST interface of the Registry. The datasource parses the query, retrieves the data querying a timeseries and returns an SDMX document filled with the timeseries data set. The latter is then forwarded to the SDMX client as a response of the original query. In order to query a datasource a client can use the sdmx-datasource-client.

Developer libraries

Several libraries are provided to the developers in order to access SDMX Registry and Datasource functionalities:

  • sdmx-registry-client and the utility library sdmx-registry-client-gcube
  • sdmx-datasource-client

sdmx-registry-client

sdmx-registry-client is a client library for accessing Registry services. The implementation provided allows to query a subset of the REST interface methods of a SDMX Registry.

sdmx-registry-client-gcube is a support library that allows the automatic retrieval of a scope related SDMX Registry instance.

Maven artifacts

Find sdmx-registry-client on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client</artifactId>

Find the latest version of sdmx-registry-client-gcube on Nexus Repository Browser by looking for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-registry-client-gcube</artifactId>

Initialization

Since version 1.1.0, sdmx-registry-client does not require the client to use spring injection in order to initialize a registry client instance.

The library provides an implementation for the FusionRegistryClient. This client must be initialized with a registry descriptor with URLs for the SOAP/REST service endpoints and an enumerated value telling the client which protocol to use (like 'Rest v2.1').

SDMXRegistryDescriptorImpl descriptor = new SDMXRegistryDescriptorImpl();
descriptor.setRest_url_V2_1("http://pc-fortunati.isti.cnr.it:8080/FusionRegistry/ws/rest/");
client = new FusionRegistryClient(descriptor,SDMXRegistryInterfaceType.RESTV2_1);
retrievalManager = context.getBean(InMemoryRetrievalManager.class);

If the client application wants to leverage sdmx-registry-client-gcube by querying directly the IS in order to retrieve a registry descriptor for a given scope:

ScopeProvider.instance.set("/gcube/devsec");
client = new FusionRegistryClient(new GCubeSDMXRegistryDescriptor(),SDMXRegistryInterfaceType.RESTV2_1);

Here the registryDescriptor is simply an instance of GCubeSDMXRegistryDescriptor class which is responsible for retrieving the SDMX Registry endpoints lazily at runtime by querying the IS using the scope provided by 'ScopeProvider'

Important: in order to use sdmx-registry-client-gcube the library client must set the right scope using ScopeProvider facilities. Example:

ScopeProvider.instance.set("/gcube/devsec");

Usage

sdmx-registry-client provides methods for publishing/retrieving SMDX Structural Metadata to/from the repository. The artifacts that are supported are:

  • AgencyScheme
  • Codelist
  • ConceptScheme
  • DataStructure (DSD)
  • Dataflow
  • DataProviderScheme
  • ProvisionAgreement
  • Registration

Each method of the sdmx-registry-client can return a SDMXRegistryClientException reporting an error code and a message. Error types and codes are described in the web service guidelines document of the sdmx specification.

Publication of Structural Metadata

In order to publish data the client must create a sdmxsource bean and fill it with data. SDMXSource beans are mainly divided into two categories:

  • Mutable beans: Objects whose attributes values can be modified
  • Immutable bean: Objects whose attribute values cannot be modified

Before invoking the registry a client must first create mutable beans, enrich them with data and then obtain the related immutable instance by using the method getImmutableInstance().

Once the bean is ready to be sent, the client can invoke SDMXRegistry.publish() method to send the data to the registry. A SubmissionReport containing the status of the operation and important information about the error in case of failure.

In the following snippet of code a client creates an agency scheme (root agency scheme) and a codelist and publish them to the registry:

SDMXRegistryDescriptorImpl descriptor = new SDMXRegistryDescriptorImpl();
descriptor.setRest_url_V2_1("http://pc-fortunati.isti.cnr.it:8080/FusionRegistry/ws/rest/");
client = new FusionRegistryClient(descriptor,SDMXRegistryInterfaceType.RESTV2_1);
 
/** Create agency scheme **/
// Create Agency Scheme from scratch
AgencySchemeMutableBean agencyScheme = new AgencySchemeMutableBeanImpl();
agencyScheme.setId("AGENCIES");
agencyScheme.setAgencyId("SDMX");
agencyScheme.setVersion("1.0");
agencyScheme.addName("en", "SDMX Agency Scheme");
// ROOT Agency
AgencyMutableBean sdmxAgency = new AgencyMutableBeanImpl();
sdmxAgency.setId("SDMX");
sdmxAgency.addName("en", "SDMX");
agencyScheme.addItem(sdmxAgency);
// FAO Agency
AgencyMutableBean myAgency = new AgencyMutableBeanImpl();
myAgency.setId("MYAGENCY");
myAgency.addName("en", "My Agency");
agencyScheme.addItem(myAgency);
 
/** Create codelist with two codes **/
CodelistMutableBean codelist = new CodelistMutableBeanImpl();
codelist.setAgencyId("MYAGENCY");
codelist.setId("TEST_CODELIST");
codelist.setVersion("1.0");
codelist.addName("en", "Test codelist");
 
//First code
CodeMutableBean code1 = new CodeMutableBeanImpl();
code1.addName("en", "Test code");
code1.setId("TEST_CODE_1");
code1.addDescription("en", "Test description");
AnnotationMutableBean annotation = new AnnotationMutableBeanImpl();
annotation.setTitle("Annotation title");
annotation.addText("en", "Annotation text");
code1.addAnnotation(annotation);
 
//Second code
CodeMutableBean code2 = new CodeMutableBeanImpl();
code2 = new CodeMutableBeanImpl();
code2.addName("en", "Test code");
code2.setId("TEST_CODE_2");
code2.addDescription("en", "Test description");
code2.addDescription("it", "Descrizione di test");
 
codelist.addItem(code1);
codelist.addItem(code2);
 
/** Publish artifacts **/
try {
	client.publish(agencyScheme.getImmutableInstance());
} catch (SDMXRegistryClientException e) {
	// Handle SDMXRegistryClientException
	e.printStackTrace();
}
try {
	client.publish(codelist.getImmutableInstance());
} catch (SDMXRegistryClientException e) {
	// Handle SDMXRegistryClientException
	e.printStackTrace();
}

SDMXRegistryClientExceptions may return when querying the registry. The exception provides useful info on the error encountered.

Full example regarding the publication of artifacts can be found in the sources package (tests).

Retrieval of Structural Metadata

In order to retrieve structural metadata the client can invoke one of the getter methods, each one specific for a type of supported structural metadata.

SDMXRegistryDescriptorImpl descriptor = new SDMXRegistryDescriptorImpl();
descriptor.setRest_url_V2_1("http://pc-fortunati.isti.cnr.it:8080/FusionRegistry/ws/rest/");
client = new FusionRegistryClient(descriptor,SDMXRegistryInterfaceType.RESTV2_1);
try {
SdmxBeans beans = client.getCodelist("all", "all", "LATEST", Detail.allstubs, References.none);
} catch (Exception e) {
Assert.fail();
}

Those methods follows the registry interface specification for the parameters provided, therefore the client can both provide an AgencyID with artifact ID and specific version or use the special keywords ("all","latest"); The client can also specify a level of detail for the structural metadata to be retrieved and ask the registry to return artifacts linked to the selected structural metadata.

sdmx-datasource-client

Maven artifacts

Find sdmx-datasource-client on Nexus Repository Browser for an artifact with the following coordinates:

<groupId>org.gcube.datapublishing</groupId>
<artifactId>sdmx-datasource-client</artifactId>

Initialization

Spring Dependency Injection can be set using the following configuration:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
	xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context"
	xmlns:jee="http://www.springframework.org/schema/jee" xmlns:tx="http://www.springframework.org/schema/tx"
	xsi:schemaLocation="
			http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
			http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
			http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
			http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-3.0.xsd
			http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">
 
        <context:spring-configured />
	<context:annotation-config />
 
	<context:component-scan base-package="org.sdmxsource" />
 
	<bean id="datasourceDescriptor"
		class="org.gcube.datapublishing.sdmx.impl.model.GCubeSDMXDatasourceDescriptorIS" />
 
	<bean id="datasourceClient"
		class="org.gcube.datapublishing.sdmx.impl.datasource.GCubeSDMXDatasourceClientImpl" />
 
</beans>

The datasourceDescriptor refers to the java class GCubeSDMXDatasourceDescriptorIS which is responsible for retrieving the Datasource endpoints at runtime by querying the IS using the scope set with ScopeProvider facilities.

Usage

sdmx-datasource can be used in order to:

  • register timeseries data available through a timeseries service
  • query a datasource for sdmx data document

In order to register a timeseries the client must provide:

  • the Agency Id related to the dataflow being provided
  • the Dataflow Id of the dataflow being provided
  • the version of the Dataflow being provided
  • the Agency Id of the agency declaring the Provider Agency
  • the Provider Agency's Agency Id
  • a timeseries service scope
  • the timeseries id
  • the registry scope

TimeseriesRegistrations can also be queried or removed from the datasource.

In order to query the datasource for data the client must provide:

  • An ID identifying the dataflow of interest
  • a series of code values identifying the dimension of interest
  • an ID identifying the provider agency
  • an integer identifying the max number of observations requested
  • an enumerated value which allows filtering of the returned data. Admitted values:
    • full(default): all data and documentation, including annotations
    • dataonly: attributes and groups exluded
    • serieskeyonly: returns only the series elements and the dimensions that make up the series keys. This is useful for performance reasons, to return the series that match a certain query, without returning the actual data.
    • nodata: returns the groups and series, including attributes and annotations, without observations.
  • an enumerated value representing the requested sdmx document version

Examples regarding those use cases can be found in sources package (tests). Check nexus for the lates version of the artifact.