Difference between revisions of "GCube SDMX Statistical Data Dissemination System"

From Gcube Wiki
Jump to: navigation, search
m (Publication of Timeseries data)
m (Publication of Timeseries data)
Line 15: Line 15:
  
 
===Publication of Timeseries data===
 
===Publication of Timeseries data===
 +
[[File:TimeseriesSDMXPublishing.jpg|frame|right|Publication of timeseries data]]
 
In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the ''Registry'', describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:
 
In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the ''Registry'', describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:
 
* Register the ''Agency'' providing the SDMX artifacts on the ''Root Agency Scheme'': the Agency provides information on the organization that is delivering Structural Metadata.
 
* Register the ''Agency'' providing the SDMX artifacts on the ''Root Agency Scheme'': the Agency provides information on the organization that is delivering Structural Metadata.
Line 27: Line 28:
 
Once the timeseries is registered on the SDMX Datasource a ''datasource registration'' can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.
 
Once the timeseries is registered on the SDMX Datasource a ''datasource registration'' can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.
  
[[File:TimeseriesSDMXPublishing.jpg]]
+
'''sdmx-publisher''' library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.
 
+
The publisher library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.
+
  
 
===Retrieval of SDMX Structural Metadata or Data===
 
===Retrieval of SDMX Structural Metadata or Data===

Revision as of 13:32, 21 December 2012

SDMX Statistical Data Dissemination System allows to publish timeseries data in SDMX format. SDMX standards is primarily focused on the exchange and dissemination of statistical data and metadata. The standards define processes and a protocol for exchanging statistical data and standard formats for representing data and metadata.

Fundamental definitions

Data set: Statistical Data is usually organized into discrete sets, which include particular observations for a specific period of time. A data set can be understood as collection of similar data, sharing a structure, which covers a fixed period of time.

Architecture

SDMX architecture comprises several components:

  • SDMX Registry: The SDMX registry is a service that holds SDMX Structural Metadata Artifacts and provides SDMX SOAP and REST interfaces for retrieving and registering those artifacts. It also works as a reverse proxy between an SDMX client and the SDMX Datasource.
  • gCube SDMX Datasource: A generic SDMX Datasource is a service that is mainly responsible for providing access to statistical data represented in SDMX documents. Statistical data is usually stored using representation formats different than SDMX and potentially leveraging a wide array of storage backends. SDMX Datasource abstracts those aspects by providing means of accessing data backends. Furthermore it transforms non-SDMX data formats into one of the standard SDMX formats. Standing to SDMX specifications an SDMX Datasource can be a service as simple as a web server or a more complex one, delivering SDMX standard REST or SOAP interfaces. gCube SDMX Datasource specifically provides access to gCube Timeseries, available through Timeseries Service.
  • Timeseries Service: It is a SOAP service that allows the retrieval of timeseries.

Use Cases

Publication of Timeseries data

Publication of timeseries data

In order to make timeseries data available to the public in SDMX format the Timeseries Service must first register several SDMX Structural Metadata on the Registry, describing the data excange setup and related actors. The steps and the related Structural Metadata to be provided are, in order:

  • Register the Agency providing the SDMX artifacts on the Root Agency Scheme: the Agency provides information on the organization that is delivering Structural Metadata.
  • Register one or more Codelists: Each codelist enumerates a set of values to be used in the representation of dimensions, attributes, and other structural parts of SDMX. For example a codelist may define how to identify a set of countries.
  • Register one or more Concept Scheme: A concept scheme is a maintained list of concepts that are used in data structure definitions and metadata structure definitions. For example a concept scheme may define the concept of country.
  • Register a Data Structure Definition (DSD): Each data set has a set of structural metadata. These descriptions are referred to in SDMX as Data Structure Definitions, which include information about how concepts are associated with the measures, dimensions, and attributes of a data “cube,” along with information about the representation of data and related identifying and descriptive (structural) metadata.
  • Register a Dataflow: a Dataflow definition describes a statistical data set exchange where multiple data providers can be involved by describing the related DSD and constraints which applies to the dataflow (like reporting periodicity).
  • Register a Data Provider within the Agency DataProviderScheme: Each actor providing data is identified as a Data Provider. A Data Provider Agency Scheme is an Organisation Scheme (like the Agency Scheme), supplied by an agency, that describes a set of Data Providers.
  • Register a Provision Agreement: The set of information which describes the way in which data sets and metadata sets are provided by a data provider. The term “agreement” is used because this information can be understood as the basis of a “service-level agreement”.

The information provided to the Registry with these steps completely describes who is providing statistical data and which domain the data is going to cover. In order to let the Registry know where the data published by a data provider can be retrieved the Timeseries service must first register a timeseries on the SDMX Datasource, opening access to a particular timeseries data set to it. This step involves the invocation of gCube specific web service method made available by the SDMX Datasource. Once the timeseries is registered on the SDMX Datasource a datasource registration can be safely sent to the registry, holding the endpoint through which SDMX Datasource service can be accessed.

sdmx-publisher library allows to register all of the aforementioned Structural Metadata artifacts on the registry and provides simplified methods for registering timeseries data sets both on the gCube SDMX Datasource and SDMX Registry.

Retrieval of SDMX Structural Metadata or Data

File:SDMXRetrieval.png

sdmx-registry-client

TODO

  • Maven deps
  • Spring injection
  • API documentation

sdmx-datasource-client

TODO

  • Maven dep
  • Spring injection
  • API documentation

sdmx-publisher

TODO

  • Maven dep
  • Spring injection
  • API documentation