Difference between revisions of "GCat Background"

From Gcube Wiki
Jump to: navigation, search
(GeoSpatial search for datasets: via API or Search Widget)
(Users, Roles and Groups)
 
(433 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''** THIS DOCUMENT IS A DRAFT **'''
 
  
gCube Data Catalogue.... using CKAN.
+
{|align=right
 +
||__TOC__
 +
|}
  
CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.
+
A catalogue is a service supporting its users to publish and search collections of descriptive information (metadata) for items including data, services, and related information objects.  
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data see: http://ckan.org/
+
  
== gCube Data Catalogue: Metadata ==
+
D4Science offers services for seamless access and analysis to a wide spectrum of data including biological and ecological data, geospatial data, statistical data and semi-structured data from multiple authoritative data providers and information systems. These services can be exploited both via web based graphical user interfaces and web based protocols for programmatic access, e.g. OAI-PMH, CSW, WFS, SDMX. This offering nicely complements specific and community-specific applications.
 +
The gCube Data Catalogue catalogue contains a wealth of resources resulting from several activities, projects and communities including BlueBRIDGE (www.bluebridge-vres.eu/), i-Marine (www.i-marine.eu), SoBigData.eu (www.sobigdata.eu), and FAO (www.fao.org). All the products are accompanied with rich descriptions capturing general attributes, e.g. title and creator(s), as well as usage policies and licences.
  
A Metadata in the gCube Data Catalogue is made by two parts: [[#CKAN's default metadata fields | CKAN's default metadata fields]] and [[#gCube Metadata Profile | gCube Metadata Profile]].
+
The gCube Data Catalogue is built using and extending CKAN platform. CKAN is a powerful DMS (data management system) that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data.
 +
CKAN is an open-source DMS for powering data hubs and data portals. CKAN makes it easy to publish, share and use data see: http://ckan.org/
  
 +
CKAN model is made by the following entities (and their relations):
 +
 +
[[File:ckan_entities.png|1000px|thumb|center|CKAN: 'Entities and Relations']]
 +
 +
== Available Catalogues and their public locations ==
 +
 +
''BLUEBRIDGE Catalogue''
 +
 +
* https://bluebridge.d4science.org/catalogue-bluebridge
 +
* https://i-marine.d4science.org/catalogue-bluebridge
 +
 +
''D4Science Catalogue''
 +
* https://services.d4science.org/catalogue-d4s1
 +
 +
== Metadata ==
 +
 +
A Metadata in the gCube Data Catalogue is made by two parts: [[#CKAN's default metadata fields | CKAN's default metadata fields]] and [[#gCube Metadata Profile | gCube Metadata Profile]].
  
 
=== CKAN's default metadata fields ===
 
=== CKAN's default metadata fields ===
Line 62: Line 81:
 
|  
 
|  
 
| Owner of dataset
 
| Owner of dataset
| The person who created the dataset
+
| The person who created the dataset in the format: Surname, Name
| Joe Bloggs
+
| Bloggs, Joe
 
|-
 
|-
 
| Author Contact
 
| Author Contact
Line 74: Line 93:
 
|  
 
|  
 
| Mantainer of the dataset
 
| Mantainer of the dataset
| The person who maintains the dataset
+
| The person or the authority that maintains the dataset
| Joe Bloggs
+
| A person: Bloggs, Joe. An authority: D4Science
 
|-
 
|-
 
| Mantainer
 
| Mantainer
Line 86: Line 105:
  
 
''mandatory fields are marked with an asterisk (*)''
 
''mandatory fields are marked with an asterisk (*)''
 
  
 
=== gCube Metadata Profile ===
 
=== gCube Metadata Profile ===
Line 92: Line 110:
 
gCube Metadata Profile defines a Metadata schema XML-based for adding custom metadata fields.
 
gCube Metadata Profile defines a Metadata schema XML-based for adding custom metadata fields.
  
A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) that contains one or many (<metadatafield>). The schema is the following:
+
A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) containing an ordered list of (at least one) '''Metadata Field''' (<metadatafield>).  
 +
From version 3 a Metadata Field can contain also a reference (categoryref="category_id_#") to an entity "Category" using the Namespace of the Category (<namespace id="category_id_#">).
 +
Add a Category Reference to a Metadata Field means that the "field" belongs to the Category referred by Category Identifier (id="category_id_#).
 +
See Metadata Profile v.3. for more details.
 +
 
 +
===== Metadata Profile v.4 =====
 +
 
 +
Metadata Profile v.4 is a XML file having the format:
  
 
<pre>
 
<pre>
<?xml version="1.0" encoding="UTF-8">
+
<?xml version="1.0" encoding="UTF-8"?>
<metadataformat>
+
<metadataformat type="YOUR TYPE HERE">
     <metadatafield>
+
     <metadatafield categoryref="category_id_#">
         <fieldName>Name</fieldName>
+
        <fieldId>ID of Metadata Field that identifies the field name in the Document (stored in the Service)</fieldId>
         <mandatory>true</mandatory>
+
         <fieldName>Name of Metadata Field</fieldName>
         <isBoolean>false</isBoolean>
+
         <mandatory>true|false</mandatory>
         <defaulValue>default value</defaulValue>
+
         <dataType>String|Time|Time_Interval|Times_ListOf|Text|Boolean|Number|GeoJSON</dataType>
         <note>shown as suggestions in the insert/update metadata form of CKAN</note>
+
         <maxOccurs>N|*</maxOccurs>
         <vocabulary>
+
        <defaultValue>default value</defaultValue>
 +
         <note>[the note is shown as a suggestion in the insert/update metadata form of Catalogue Publisher Widget]
 +
</note>
 +
         <vocabulary isMultiSelection="true|false">
 
             <vocabularyField>field1</vocabularyField>
 
             <vocabularyField>field1</vocabularyField>
 
             <vocabularyField>field2</vocabularyField>
 
             <vocabularyField>field2</vocabularyField>
             <!-- ... others vocabulary fields -->
+
             <vocabularyField>field3</vocabularyField>
 
         </vocabulary>
 
         </vocabulary>
 
         <validator>
 
         <validator>
 
             <regularExpression>a regular expression for validating values</regularExpression>
 
             <regularExpression>a regular expression for validating values</regularExpression>
 
         </validator>
 
         </validator>
 +
        <tagging create="true|false" separator="char_to_separate">onFieldName|onValue|onFieldName_onValue|onValue_onFieldName</tagging>
 +
        <grouping create="true|false">onFieldName|onValue|onFieldName_onValue|onValue_onFieldName</grouping>
 
     </metadatafield>
 
     </metadatafield>
    <!-- ... others metadata fields -->
 
 
</metadataformat>
 
</metadataformat>
 
</pre>
 
</pre>
  
It's possible to validate a Metadata Format schema using following DTD
 
<pre>
 
  
<?xml version="1.0" encoding="UTF-8"?>
+
The <fieldId> is optional. It declares (if present in the profile) the value that will be used to specify the field name in the Document (e.g. JSON Document) passed to Service that will store the resulting Document.
<!ELEMENT metadataformat (metadatafield+)>
+
If the <fieldId> is absent in the profile, the value of the <fieldName> (which is mandatory) will be used as field name in the Document.
<!ELEMENT metadatafield (fieldName, mandatory, isBoolean?, defaulValue?, note?, vocabulary?, validator?)>
+
<!ELEMENT fieldName (#PCDATA)>
+
<!ELEMENT mandatory (#PCDATA)>
+
<!ELEMENT isBoolean (#PCDATA)>  <!-- MUST BE (true|false) -->
+
<!ELEMENT defaulValue (#PCDATA)>
+
<!ELEMENT note (#PCDATA)>
+
<!ELEMENT vocabulary (vocabularyField+)>
+
<!ELEMENT vocabularyField (#PCDATA)>
+
<!ELEMENT validator (regularExpression)>
+
<!ELEMENT regularExpression (#PCDATA)>
+
  
</pre>
+
The <fieldName> field contains the name of the metadata field.
 +
 
 +
The <mandatory> field declares if the <metadatafield> is a field mandatory (by using 'true') or not (by using 'false').
 +
 
 +
'''DataType values''':
 +
 
 +
The <dataType> field specifies the kind of data. A valid dataType must be equal to one of the values {String, Time, Time_Interval, Times_ListOf, Text, Boolean, Number, GeoJSON}. When the data type is not specified the metadata field has the default value "String". '''Temporal type''':  can be specified by using the value Time or Time_Interval or Times_ListOf (based on [https://en.wikipedia.org/wiki/ISO_8601 ISO 8601]). '''Spatial type''': can be specified by using the value GeoJSON.
 +
 
 +
In detail:
 +
* String: is a string;
 +
* Time: an instant time that follows the general format: YYYY-MM-DD [HH:MM] where: YYYY: 4-digit year, MM: 2-digit month, DD: 2-digit day, [optional HH: 2-digit hour], [optional MM: 2-digit minute] (e.g. "2005-03-01");
 +
* Time_Interval: a continuous interval instead of a single instant by specifying a start and end time, separated by one '/' ('slash') character (e.g. "2005-03-01/2006-05-11");
 +
* Times_ListOf: a list of discrete time values, separated by a ',' ('comma') character (e.g. "2005-03-01, 2006-05-11, 2006-05-11-2007-04-12");
 +
* Text: is a text;
 +
* Boolean: is True/False;
 +
* Number: is a valid Java number, see: Apache Commons NumberUtils.isNumber;
 +
* GeoJSON: is a string in the JSON format of kind GeoJSON (in particular it should contain a '''GeoJSON geometry'''). The [http://geojson.org/geojson-spec.html GeoJSON] is a format for encoding a variety of geographic data structures.
 +
 
 +
: '''GeoSpatial Data (the ''spatial'' field)''':
  
A possible instance of Metadata Field (<metadatafield>):
+
: In order to make a metadata a GeoSpatial Data and searchable by location via GeoSpatial Search Widget (see at [[#GeoSpatial_search_for_datasets:_via_API_or_Search_Widget]]), it must have a 'fieldName' named `spatial` with 'dataType' GeoJSON and a valid GeoJSON geometry as value.
  
 +
: E.g. A MedataField with GeoSpatial data:
 
<pre>
 
<pre>
<metadatafield>
+
    <metadatafield idref="category_id_#">
  <fieldName>Accessibility</fieldName>
+
        <fieldName>spatial</fieldName> <!--'spatial' is the reserved field name to assign a GeoSpatial dimension to metadata  -->
  <mandatory>true</mandatory>
+
        <dataType>GeoJSON</dataType>
  <defaulValue>virtual/public</defaulValue>
+
        <defaultValue>{"type": "Point","coordinates": [-20.145,74.078]}</defaultValue>
  <vocabulary>
+
        <note>Please, insert a valid GeoJSON</note>
      <vocabularyField>virtual/public</vocabularyField>
+
    </metadatafield>
      <vocabularyField>virtual/private</vocabularyField>
+
      <vocabularyField>transactional</vocabularyField>
+
  </vocabulary>
+
</metadatafield>
+
 
</pre>
 
</pre>
  
=== SoBigData.eu: Dataset Metadata ===
+
: see more details about [[#Geo_Datasets]]
  
The current list of fields characterising a SoBigData resource is available at https://docs.google.com/spreadsheets/d/1kuhvmDVKpmqt2foyCB9wDo3HgzoAiCuRQ8CjRS-DVOM/edit?usp=sharing
+
: '''Temporal Data (the ''time_date'' field)''':
  
The following fields have been identified:
+
: In order to make a metadata a Temporal Data and searchable by time via Time Search Widget, it must have a 'fieldName' named `time_date` with 'dataType' Time and a valid ISO 8601 date as value.
  
{| class="wikitable"
+
: E.g. A MedataField with Temporal data:
! style="font-weight: bold;" | Field
+
! style="font-weight: bold;" | In Catalogue
+
|-
+
|colspan="2" align="center"|'''Internal Fields'''
+
|-
+
| Internal Identifier
+
| Automatically created
+
|-
+
| Creation Date
+
| Automatically created
+
|-
+
| Last Modification
+
| Automatically updated
+
|-
+
|colspan="2" align="center"|'''General Description'''
+
|-
+
| Title
+
| Title
+
|-
+
| Identifier
+
|
+
 
<pre>
 
<pre>
<fieldName>External Identifier</fieldName>
+
    <metadatafield idref="category_id_#">
<mandatory>false</mandatory>
+
        <fieldName>time_date</fieldName> <!--'time_date' is the reserved field name to assign a Temporal dimension to metadata -->
<isBoolean>false</isBoolean>
+
        <dataType>Time</dataType>
<defaulValue></defaulValue>
+
        <defaultValue>2019-7-29</defaultValue>
<note>This applies only to datasets that have been already published.
+
        <note>Please, insert a valid ISO 8601 date</note>
  Insert here a DOI, an handle, and any other Identifier assigned when
+
    </metadatafield>
  publishing the dataset alsewhere.</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
 
</pre>
 
</pre>
|-
+
 
| Creators
+
: see more details about [[#Temporal_Datasets]]
| Author is there, unfortunately there is only one author per Dataset. Moreover, the technology supports only key value pairs ... no complex types.
+
 
 +
'''maxOccurs Indicator''':
 +
 
 +
The <maxOccurs> indicator specifies the maximum number of times that <metadatafield> can occur:
 +
* N (as number): if the field must appear N  times;
 +
* * (as char asterisk): if the field can appear an unlimited number of times.
 +
 
 +
'''Categories as "Namespaces"''':
 +
 
 +
* the Namespace of a Category declares a "class" for metadata fields having particular characteristics. It has been introduced in order to group metadata fields for categories and displaying them in a dedicated area through advanced GUI provided by CKAN D4Science plugin.
 +
 
 +
Namespaces (for Categories) are defined in an XML file made by one Namespaces element (<namespaces>) containing a list of (at least) one or many Namespace (<namespace>). The file has the format:
 
<pre>
 
<pre>
<fieldName>Creator</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>true</mandatory>
+
<namespaces xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<isBoolean>false</isBoolean>
+
<namespace id="category_id_#">
<defaulValue></defaulValue>
+
<name>Category Name</name>
<note>The name of the creator, with email and ORCID. The format should be: family, given[, email][, ORCID].
+
<title>Category Title</title>
  Examples: Smith, John, js@acme.org, orcid.org/0000-0000-0000-0000; Miller, Elizabeth
+
<description>This section is about Category description</description>
</note>
+
</namespace>
<vocabulary></vocabulary>
+
</namespaces>
<validator></validator>
+
</pre> 
+
|-
+
| Creation Date
+
|
+
<pre>
+
<fieldName>CreationDate</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>The date of creation of the dataset (different from the date of creation of the dataset automatically added by the system)
+
</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
</pre>
+
|-
+
| Distributor
+
| Maintainer
+
|-
+
| Publisher
+
|
+
???
+
|-
+
| Publication Date
+
| when the dataset is published in the repository ... no field have to be specified;
+
|-
+
| Contact
+
| Go for Maintainer? I would go for Maintainer email
+
|-
+
| Thematic Cluster
+
|
+
Shall we go for a Topic too? I think so.
+
<pre>
+
<fieldName>ThematicCluster</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>The SoBigData.eu Thematic Clusters
+
</note>
+
<vocabulary>
+
  <vocabularyField>Text and Social Media Mining</vocabularyField>
+
  <vocabularyField>Social Network Analysis</vocabularyField>
+
  <vocabularyField>Human Mobility Analytics</vocabularyField>
+
  <vocabularyField>Web Analytics</vocabularyField>
+
  <vocabularyField>Visual Analytics</vocabularyField>
+
  <vocabularyField>Social Data</vocabularyField>
+
</vocabulary>
+
<validator></validator>
+
</pre> 
+
|-
+
| Area
+
| Tag vs domain specific field
+
|-
+
| Semantic Coverage
+
| Tag vs domain specific field 
+
|-
+
| Time Coverage Start Date
+
|
+
<pre>
+
<fieldName>TimeCoverage</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>List of time intervals, e.g. 1977-03-10T11:45:30 - 2005-01-15T09:10:00</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
</pre> 
+
|-
+
| Time Coverage End Date
+
| not needed see above
+
|-
+
| Geo Location
+
|
+
<pre>
+
<fieldName>spatial</fieldName>
+
<mandatory>false</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>The value must be a valid GeoJSON geometry, for example:
+
  {
+
      "type":"Polygon",
+
      "coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]]
+
  }
+
  or:
+
  {
+
      "type": "Point",
+
      "coordinates": [-3.145,53.078]
+
  }
+
</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
 
</pre>
 
</pre>
  
More on [http://geojson.org/ GeoJSON geometry].
+
A Namespace element (<namespace>) has an attribute (id) and three entities. The attribute "id" must be unique in the file [[#Namespaces_Categories_schema:_NamespacesCatalogueCategories.xsd]], it represents the category identifier for the Category. The elements are: name (is mandatory), title (is mandatory), description (is optional).
|-
+
 
| ProcessingDegree
+
Metadata Field and Category Reference (categoryref="category_id_#"):
|  
+
* categoryref is an optional attribute. It is a unique id (id="category_id_#"). A metadata field can belong to only one Namespace of a Category referring it via idref (categoryref="id category to which metadata field belongs one").
Shall we go for a Topic too? I think so.  
+
 
 +
Type of (meta)data (is Mandatory):
 +
* type: a Metadata Format (metadataformat) must have a unique 'type' (as a xml attribute) that declares a "type" for it. This mandatory information is saved as custom key (system:type="value of type") of the item stored in the Data Catalogue.
 +
 
 +
'''Tagging''':
 +
* It is used by gCube Data Catalogue front-end for adding a metadata field as a Tag of the metadata. A Tag is a string between 2 and 100 characters long containing only alphanumeric characters and '-' (hyphen), '_' (underscore), . (dot). Tagging element in the Metadata Profile schema v3 must have a value equal to one of the values: {onFieldName, onValue, onFieldName_onValue, onValue_onFieldName}. Tagging values meanings:
 +
** onFieldName: (only) the fieldName specified to metadata field must be added as a Tag;
 +
** onValue: (only) the value specified to metadata field must be added as a Tag;
 +
** onFieldName_onValue: both the fieldName and the value (in this order) specified to metadata field must be added as a Tag. They are separated by string used as separator (<tagging create="true|false" separator="char_to_separate">{onFieldName_onValue}</tagging>);
 +
** onValue_onFieldName: both the value and the fieldName (in this order) specified to metadata field  must be added as a Tag. They are separated by string used as separator (<tagging create="true|false" separator="char_to_separate">{onValue_onFieldName}</tagging>).
 +
* Moreover, Tagging has two (optional) attribute: 'create' and 'separator'. The first one (create="true"|"false") is used to mean: create the Tag if does not exist, no otherwise. The second one (separator="char_to_separate") is the string that will be used to separate the FieldName from its value. Default value for separator is the character '-' if it is not specified.
 +
 
 +
Tagging example: using following instance of metadata field
 
<pre>
 
<pre>
<fieldName>ProcessingDegree</fieldName>
+
<metadatafield categoryref="contact">
<mandatory>true</mandatory>
+
<fieldName>Name</fieldName>
<isBoolean>false</isBoolean>
+
<dataType>String</dataType>
<defaulValue></defaulValue>
+
<defaultValue>My Name</defaultValue>
<note>Whether primary or secondary dataset.
+
<note>Insert your Name</note>
</note>
+
<tagging create="true" separator="-">onFieldName_onValue</tagging>
<vocabulary>
+
</metadatafield>
  <vocabularyField>Primary</vocabularyField>  
+
  <vocabularyField>Secondary</vocabularyField>
+
</vocabulary>
+
<validator></validator>
+
 
</pre>
 
</pre>
|-
 
| ManifestationType
 
|
 
Shall we go for a Topic too? I think so.
 
<pre>
 
<fieldName>ManifestationType</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>Virtual (accessible in streaming from remote sites), replica (copy of data in remote sites, e.g. DBPL),
 
  original (collection of data produced and kept in local infra by data provider).
 
</note>
 
<vocabulary>
 
  <vocabularyField>Virtual</vocabularyField>
 
  <vocabularyField>Replica</vocabularyField>   
 
  <vocabularyField>Original</vocabularyField>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Language
 
|
 
Shall we go for a Topic too? I think so.
 
<pre>
 
<fieldName>Language</fieldName>
 
<mandatory>false</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>The primary language of the resource (use ISO 639-1).
 
</note>
 
<vocabulary></vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Description
 
| Description
 
|-
 
| RelatedLiterature
 
|
 
<pre>
 
<fieldName>RelatedPaper</fieldName>
 
<mandatory>false</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>Insert a complete reference to an associated work. 
 
</note>
 
<vocabulary></vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| RelatedDataset
 
| TBD
 
|-
 
|colspan="2" align="center"|'''Accessibility properties'''
 
|-
 
| Accessibility
 
|
 
<pre>
 
<fieldName>Accessibility</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>How the access to the resource is regulated: VA (Virtual Access) or TNA (Trans National Access), Public vs Restricted. 
 
</note>
 
<vocabulary>
 
  <vocabularyField>VA/Public</vocabularyField>
 
  <vocabularyField>VA/Restricted</vocabularyField>
 
  <vocabularyField>TNA/Restricted</vocabularyField>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| AccessibilityMode
 
|
 
<pre>
 
<fieldName>AccessibilityMode</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>How the access to the resource is offered. 
 
</note>
 
<vocabulary>
 
  <vocabularyField>Programmatic (e.g. API)</vocabularyField>
 
  <vocabularyField>By file</vocabularyField>
 
  <vocabularyField>...</vocabularyField>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Privacy
 
| TBC
 
|-
 
|colspan="2" align="center"|'''Technical properties'''
 
|-
 
| Size
 
|-
 
| DiskSize
 
|-
 
| Format
 
|-
 
| FormatSchema
 
|-
 
| API
 
|-
 
|colspan="2" align="center"|'''Legally and Ethical Aspects'''
 
|-
 
| Personal Data
 
|
 
<pre>
 
<fieldName>PersonalData</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>true</isBoolean>
 
<defaulValue></defaulValue>
 
<note>The dataset contains personal data?</note>
 
<vocabulary>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Personal sensitive data
 
|
 
<pre>
 
<fieldName>PersonalSensitiveData</fieldName>
 
<mandatory>false</mandatory>
 
<isBoolean>true</isBoolean>
 
<defaulValue></defaulValue>
 
<note>The dataset contains personal sensitive data?</note>
 
<vocabulary>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Data set contains data of children
 
|
 
<pre>
 
<fieldName>ChildrenData</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>true</isBoolean>
 
<defaulValue></defaulValue>
 
<note>The dataset contains children data?</note>
 
<vocabulary>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Consent of the data subject
 
| TBD
 
|-
 
| Consent obtained also covers the envisaged transfer of the personal data outside the EU
 
| TBD
 
|-
 
| Personal data was manifestly made public by the data subject
 
| TBD
 
|-
 
| Data Protection Directive
 
|
 
<pre>
 
<fieldName>DataProtectionDirective</fieldName>
 
<mandatory>true</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>Report the low or protocol number and the institution related to Data Protection.</note>
 
<vocabulary>
 
</vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
|colspan="2" align="center"|'''Intellectual properties'''
 
|-
 
| IP/Copyrights
 
|
 
|-
 
| Link to the source
 
| Resource
 
|-
 
| License
 
| License
 
|-
 
| Link to the license
 
| Automatic
 
|-
 
| Field/Scope of use
 
|-
 
| Basic rights
 
|-
 
| Restrictions on use
 
|-
 
| Prohibited actions
 
|-
 
| Sublicense rights
 
|-
 
| Attribution requirements
 
|-
 
| Display requirements
 
|-
 
| Distribution requirements
 
|-
 
| Territory of use
 
|-
 
| License term
 
|-
 
| Requirement of non-disclosure
 
(confidentiality mark)
 
|}
 
  
=== SoBigData.eu: Method Metadata ===
+
where My Name is "Francesco", gCube Data Catalogue adds the tag Name-Francesco to metadata field if it does not exist
  
The current list of fields characterising a SoBigData resource is available at https://docs.google.com/spreadsheets/d/1kuhvmDVKpmqt2foyCB9wDo3HgzoAiCuRQ8CjRS-DVOM/edit?usp=sharing
+
'''Grouping''':
 +
* It is used by Data Catalogue fron-end for adding a metadata field to a Group of Data Catalogue. Data Catalogue [https://ckan-d4s.d4science.org/group groups] are used for browsing. Grouping element in the Metadata Profile schema v3 must have a value equal to one of the values: {onFieldName, onValue, onFieldName_onValue, onValue_onFieldName}. The (optional) attribute create="true" is used to mean: create the Group if does not exist, no otherwise. Grouping values meanings:
 +
** onFieldName: (only) the fieldName specified to metadata field must be added to a Group;
 +
** onValue: (only) the value specified to metadata field must be added to a Group;
 +
** onFieldName_onValue: both the fieldName and the value (in this order) specified to metadata field must be added to a Group (<grouping create="true|false">{onFieldName_onValue}</grouping>);
 +
** onValue_onFieldName: both the value and the fieldName (in this order) specified to metadata field must be added to a Group (<grouping create="true|false">{onValue_onFieldName}</grouping>).
 +
* Moreover, Grouping has one (optional) attribute: 'propagateUp' This property will let a user specify if an item, that is going to be added to that group, must be also added to the hierarchical chain of groups involving it. For instance, if we have group B as child of group A, and item I is going to be added to B, then it will also be added to A.
  
The following fields have been identified:
+
====== Metadata Profile schema: gcdcmetadataprofilev3.xsd ======
 +
 
 +
The gCube Data Catalogue Metadata Profile (v.3) schema:
  
{| class="wikitable"
 
! style="font-weight: bold;" | Field
 
! style="font-weight: bold;" | In Catalogue
 
|-
 
|colspan="2" align="center"|'''Internal Fields'''
 
|-
 
| Internal Identifier
 
| Automatically created
 
|-
 
| Creation Date
 
| Automatically created
 
|-
 
| Last Modification
 
| Automatically updated
 
|-
 
|colspan="2" align="center"|'''General Description'''
 
|-
 
| Title
 
| Title
 
|-
 
| Identifier
 
|
 
 
<pre>
 
<pre>
<fieldName>External Identifier</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>false</mandatory>
+
<xs:schema attributeFormDefault="unqualified"
<isBoolean>false</isBoolean>
+
elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<defaulValue></defaulValue>
+
<xs:include schemaLocation="NamespacesCatalogueCategories.xsd"/>
<note>This applies only to methods that have been already published.
+
<xs:element name="metadataformat">
  Insert here a DOI, an handle, and any other Identifier assigned when
+
<xs:complexType>
  publishing the dataset alsewhere.</note>
+
<xs:sequence>
<vocabulary></vocabulary>
+
<xs:element name="metadatafield" minOccurs="0" maxOccurs="unbounded">
<validator></validator>
+
<xs:complexType>
 +
<xs:sequence>
 +
<xs:element type="xs:string" name="fieldName" />
 +
<xs:element type="xs:boolean" name="mandatory"
 +
minOccurs="0" maxOccurs="1" />
 +
<xs:element name="dataType">
 +
<xs:simpleType>
 +
<xs:restriction base="xs:string">
 +
<xs:enumeration value="String" />
 +
<xs:enumeration value="Time" />
 +
<xs:enumeration value="Time_Interval" />
 +
<xs:enumeration value="Times_ListOf" />
 +
<xs:enumeration value="Text" />
 +
<xs:enumeration value="Boolean" />
 +
<xs:enumeration value="Number" />
 +
<xs:enumeration value="GeoJSON" />
 +
</xs:restriction>
 +
</xs:simpleType>
 +
</xs:element>
 +
<xs:element type="xs:string" name="maxOccurs"
 +
minOccurs="0" maxOccurs="1" />
 +
<xs:element type="xs:string" name="defaultValue"
 +
minOccurs="0" maxOccurs="1" />
 +
<xs:element type="xs:string" name="note" minOccurs="0"
 +
maxOccurs="1" />
 +
<xs:element name="vocabulary" minOccurs="0" maxOccurs="1">
 +
<xs:complexType>
 +
<xs:sequence>
 +
<xs:element type="xs:string" name="vocabularyField"
 +
minOccurs="1" maxOccurs="unbounded" />
 +
</xs:sequence>
 +
<xs:attribute type="xs:boolean" name="isMultiSelection" />
 +
</xs:complexType>
 +
</xs:element>
 +
<xs:element name="validator" minOccurs="0" maxOccurs="1">
 +
<xs:complexType>
 +
<xs:sequence>
 +
<xs:element type="xs:string" name="regularExpression" />
 +
</xs:sequence>
 +
</xs:complexType>
 +
</xs:element>
 +
<xs:element name="tagging" type="TaggingType"
 +
minOccurs="0" maxOccurs="1">
 +
</xs:element>
 +
<xs:element name="grouping" type="GroupingType"
 +
minOccurs="0" maxOccurs="1">
 +
</xs:element>
 +
</xs:sequence>
 +
<xs:attribute name="categoryref" use="optional" type="xs:string" />
 +
</xs:complexType>
 +
</xs:element>
 +
</xs:sequence>
 +
<xs:attribute type="NotEmpty" use="required" name="type" />
 +
</xs:complexType>
 +
</xs:element>
 +
<xs:simpleType name="TaggingGroupingValue">
 +
<xs:restriction base="xs:string">
 +
<xs:enumeration value="onFieldName" />
 +
<xs:enumeration value="onValue" />
 +
<xs:enumeration value="onFieldName_onValue" />
 +
<xs:enumeration value="onValue_onFieldName" />
 +
</xs:restriction>
 +
</xs:simpleType>
 +
<xs:complexType name="TaggingType">
 +
<xs:simpleContent>
 +
<xs:extension base="TaggingGroupingValue">
 +
<xs:attribute type="xs:boolean" name="create" />
 +
<xs:attribute type="NotEmpty" name="separator" />
 +
</xs:extension>
 +
</xs:simpleContent>
 +
</xs:complexType>
 +
<xs:complexType name="GroupingType">
 +
<xs:simpleContent id="TaggingGroupingValue">
 +
<xs:extension base="TaggingGroupingValue">
 +
<xs:attribute type="xs:boolean" name="create" />
 +
<xs:attribute type="xs:boolean" name="propagateUp" />
 +
</xs:extension>
 +
</xs:simpleContent>
 +
</xs:complexType>
 +
<xs:simpleType name="NotEmpty">
 +
<xs:restriction base="xs:string">
 +
<xs:minLength value="1" />
 +
</xs:restriction>
 +
</xs:simpleType>
 +
</xs:schema>
 
</pre>
 
</pre>
|-
+
 
| Creators
+
You can download it by clicking on [https://wiki.gcube-system.org/images_gcube/e/e8/Gcdcmetadataprofilev3.xsd Gcdcmetadataprofilev3.xsd]
| Author is there, unfortunately there is only one author per item. Moreover, the technology supports only key value pairs ... no complex types.  
+
 
 +
A "generic" example of MetadataProfile.xml:
 +
 
 
<pre>
 
<pre>
<fieldName>Creator</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>true</mandatory>
+
<metadataformat type="the_metadata_type" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="gcdcmetadataprofilev3.xsd">
<isBoolean>false</isBoolean>
+
  <metadatafield categoryref="idvalue0">
<defaulValue></defaulValue>
+
    <fieldName>fieldName</fieldName>
<note>The name of the creator, with email and ORCID. The format should be: family, given[, email][, ORCID].
+
    <dataType>String</dataType>
  Examples: Smith, John, js@acme.org, orcid.org/0000-0000-0000-0000; Miller, Elizabeth
+
    <defaultValue>defaultValue</defaultValue>
</note>
+
    <note>note</note>
<vocabulary></vocabulary>
+
    <vocabulary isMultiSelection="true">
<validator></validator>
+
      <vocabularyField>vocabularyField</vocabularyField>
</pre>
+
    </vocabulary>
|-
+
    <validator>
| Creation Date
+
      <regularExpression>regularExpression</regularExpression>
|
+
    </validator>
<pre>
+
    <tagging create="true" separator="-">onFieldName_onValue</tagging>
<fieldName>CreationDate</fieldName>
+
    <grouping create="true">onFieldName</grouping>
<mandatory>true</mandatory>
+
  </metadatafield>
<isBoolean>false</isBoolean>
+
</metadataformat>
<defaulValue></defaulValue>
+
<note>The date of creation of the method (different from the date of creation of the dataset automatically added by the system)
+
</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
 
</pre>
 
</pre>
|-
+
 
| Distributor
+
Another example (of MetadataProfile.xml ) is the following one:
| Maintainer
+
 
|-
+
| Owner
+
|
+
???
+
|-
+
| Publication Date
+
| when the method is published in the catalogue ... no field have to be specified;
+
|-
+
| Contact
+
| Go for Maintainer? I would go for Maintainer email
+
|-
+
| Thematic Cluster
+
|
+
Shall we go for a Topic too? I think so.
+
 
<pre>
 
<pre>
<fieldName>ThematicCluster</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>true</mandatory>
+
<metadataformat type="the_metadata_type"
<isBoolean>false</isBoolean>
+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<defaulValue></defaulValue>
+
xsi:noNamespaceSchemaLocation="gcdcmetadataprofilev3.xsd">
<note>The SoBigData.eu Thematic Clusters
+
<metadatafield categoryref="contact">
</note>
+
<fieldName>Name</fieldName>
<vocabulary>
+
<dataType>String</dataType>
  <vocabularyField>Text and Social Media Mining</vocabularyField>
+
<defaultValue>My Name</defaultValue>
  <vocabularyField>Social Network Analysis</vocabularyField>
+
<note>Insert your Name</note>
  <vocabularyField>Human Mobility Analytics</vocabularyField>
+
<tagging create="true" separator="-">onFieldName_onValue</tagging>
  <vocabularyField>Web Analytics</vocabularyField>
+
</metadatafield>
  <vocabularyField>Visual Analytics</vocabularyField>
+
<metadatafield categoryref="contact">
  <vocabularyField>Social Data</vocabularyField>
+
<fieldName>Surname</fieldName>
</vocabulary>
+
<dataType>String</dataType>
<validator></validator>
+
<defaultValue>My Surname</defaultValue>
</pre> 
+
<note>Insert your Surname</note>
|-
+
</metadatafield>
| Area
+
</metadataformat>
| Tag vs domain specific field
+
|-
+
| Semantic Coverage
+
| Tag vs domain specific field 
+
|-
+
| Usage mode
+
|
+
<pre>
+
<fieldName>UsageMode</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>How the method is expected to be accessed
+
</note>
+
<vocabulary>
+
  <vocabularyField>Download</vocabularyField>
+
  <vocabularyField>as-a-Service by SoBigData Infrastructure</vocabularyField>
+
  <vocabularyField>as-a-Service by third party infrastructure</vocabularyField>
+
</vocabulary>
+
<validator></validator>
+
</pre> 
+
|-
+
| methodURL
+
| Resource
+
|-
+
| documentationURL
+
| Resource
+
|-
+
| inputParametersType
+
|
+
<pre>
+
<fieldName>input</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>See WPS
+
</note>
+
<vocabulary>
+
</vocabulary>
+
<validator></validator>
+
</pre> 
+
|-
+
| outputType
+
|
+
<pre>
+
<fieldName>output</fieldName>
+
<mandatory>true</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>See WPS
+
</note>
+
<vocabulary>
+
</vocabulary>
+
<validator></validator>
+
</pre> 
+
|-
+
| Description
+
| Description
+
|-
+
| RelatedLiterature
+
|
+
<pre>
+
<fieldName>RelatedPaper</fieldName>
+
<mandatory>false</mandatory>
+
<isBoolean>false</isBoolean>
+
<defaulValue></defaulValue>
+
<note>Insert a complete reference to an associated work. 
+
</note>
+
<vocabulary></vocabulary>
+
<validator></validator>
+
 
</pre>
 
</pre>
|-
+
 
| RelatedDataset
+
====== Namespaces Categories schema: NamespacesCatalogueCategories.xsd ======
| TBD
+
 
|-
+
The Namespaces Catalogue Categories schema:
| RelatedMethod
+
 
| TBD
+
|-
+
|colspan="2" align="center"|'''Accessibility properties'''
+
|-
+
| Accessibility
+
|
+
 
<pre>
 
<pre>
<fieldName>Accessibility</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>true</mandatory>
+
<xs:schema attributeFormDefault="unqualified"
<isBoolean>false</isBoolean>
+
elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
<defaulValue></defaulValue>
+
xmlns:category="http://www.w3.org/TR/html4/">
<note>How the access to the resource is regulated: VA (Virtual Access) or TNA (Trans National Access), Public vs Restricted. 
+
<xs:element name="namespaces">
</note>
+
<xs:complexType>
<vocabulary>
+
<xs:sequence>
  <vocabularyField>VA/Public</vocabularyField>
+
<xs:element name="namespace" minOccurs="1" maxOccurs="unbounded">
  <vocabularyField>VA/Restricted</vocabularyField>
+
<xs:complexType>
  <vocabularyField>TNA/Restricted</vocabularyField>
+
<xs:sequence>
</vocabulary>
+
<xs:element type="xs:string" name="name" minOccurs="1"
<validator></validator>
+
maxOccurs="1" />
 +
<xs:element type="xs:string" name="title" minOccurs="1"
 +
maxOccurs="1" />
 +
<xs:element type="xs:string" name="description"
 +
minOccurs="0" maxOccurs="1" />
 +
</xs:sequence>
 +
<xs:attribute type="xs:string" name="id" use="required" />
 +
</xs:complexType>
 +
</xs:element>
 +
</xs:sequence>
 +
</xs:complexType>
 +
<xs:unique name="unique-namespace-id">
 +
<xs:selector xpath="namespace" />
 +
<xs:field xpath="@id" />
 +
</xs:unique>
 +
</xs:element>
 +
</xs:schema>
 
</pre>
 
</pre>
|-
+
 
| AccessibilityMode
+
You can download it by clicking on [https://wiki.gcube-system.org/images_gcube/d/d5/NamespacesCatalogueCategories.xsd NamespacesCatalogueCategories]
|
+
 
 +
An example of valid Namespaces.xml:
 +
 
 
<pre>
 
<pre>
<fieldName>AccessibilityMode</fieldName>
+
<?xml version="1.0" encoding="UTF-8"?>
<mandatory>true</mandatory>
+
<namespaces xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<isBoolean>false</isBoolean>
+
xsi:noNamespaceSchemaLocation="NamespacesCatalogueCategories.xsd">
<defaulValue></defaulValue>
+
<namespace id="contact">
<note>How the access to the resource is offered. 
+
<name>Contact</name>
</note>
+
<title>Contact Title</title>
<vocabulary>
+
<description>This section is about Contact(s)</description>
  <vocabularyField>Programmatic (e.g. API)</vocabularyField>
+
</namespace>
  <vocabularyField>By file</vocabularyField>
+
<namespace id="developer_information">
  <vocabularyField>...</vocabularyField>
+
<name>Developer</name>
</vocabulary>
+
<title>Developer Title</title>
<validator></validator>
+
<description>This section is about Developer(s)</description>
 +
</namespace>
 +
<namespace id="extra_information">
 +
<name>Extra</name>
 +
<title>Extra Title</title>
 +
<description>This section is about Extra(s)</description>
 +
</namespace>
 +
</namespaces>
 
</pre>
 
</pre>
|-
 
|colspan="2" align="center"|'''Technical properties'''
 
|-
 
| Programming Language
 
|
 
<pre>
 
<fieldName>ProgrammingLanguage</fieldName>
 
<mandatory>false</mandatory>
 
<isBoolean>false</isBoolean>
 
<defaulValue></defaulValue>
 
<note>The primary language used to implement the method.   
 
</note>
 
<vocabulary></vocabulary>
 
<validator></validator>
 
</pre>
 
|-
 
| Hosting Environment
 
|-
 
| Source code
 
|-
 
| Artifact repository
 
|-
 
| Dependencies on Other SW
 
|-
 
|colspan="2" align="center"|'''Intellectual properties'''
 
|-
 
| IP/Copyrights
 
|
 
|-
 
| License
 
| License
 
|-
 
| Link to the license
 
| Automatic
 
|-
 
| Field/Scope of use
 
|-
 
| Basic rights
 
|-
 
| Restrictions on use
 
|-
 
| Prohibited actions
 
|-
 
| Sublicense rights
 
|-
 
| Attribution requirements
 
|-
 
| Display requirements
 
|-
 
| Distribution requirements
 
|-
 
| Territory of use
 
|-
 
| License term
 
|-
 
| Requirement of non-disclosure
 
(confidentiality mark)
 
|}
 
  
== gCube Data Catalogue: Ckan Connector ==
+
=== Instances of Metadata Profile ===
  
 +
The following list shows the instances of metadata profile created in the D4Sciece infrastructure and currently used in gCube Data Catalogue:
  
 +
===== SoBigData.eu =====
  
== gCube Data Catalogue: Geo Harvesting ==
+
[[SoBigData.eu: Metadata Profile for gCube Data Catalogue]]
  
This extension contains plugins (ckanext-geonetwork and others) that add geospatial capabilities to CKAN (https://github.com/geosolutions-it/ckanext-geonetwork/wiki).
+
== Ckan Connector ==
  
Several harvesters to import geospatial metadata into CKAN from other sources in ISO 19139 format and others has been created in gCube Data Catalogue.
+
The Ckan Connector Service is a gCube RESTFul service that allows infrastructure users to interact with CKAN.
In particular all metadata created into gCube Geonetwork (GeoNetwork is the catalog application to manage spatially referenced resources generated into D4Science Infrastructure) are harvested through the 'Geoentwork Resolver' a "middle tier" able to:
+
  
* use the Geonetwork Manager in order to harvest private metadata (via authentication) stored in gCube Geonetwork on CKAN Data Catalogue (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE to harvest private metadata generated from scope /gcube/devsec/devVRE);
+
It implements two methods:
 +
* /connect: creates a new CKAN session. The user can interact with CKAN while the session is alive (not expired or destroyed by the user).
 +
* /disconnect : destroys the current CKAN session.
  
* create a CKAN Harvester that skip all public metadata via configuration during scope harvesting (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23filterpublicids to filter public ids during harvesting of /gcube/devsec/devVRE);
+
== Geo Harvesting ==
  
* create a CKAN Harvester to harvest only public metadata (saved on Geonetwork) avoiding the Geonetwork authentication via configuration (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23noauthentication).
+
This extension contains plugins like [https://github.com/geosolutions-it/ckanext-geonetwork/wiki ckanext-geonetwork] (and others) which add geospatial capabilities to CKAN.
  
Mapping (among fields) from an ISO19139 Metadata to Ckan Dataset via ckanext-geonetwork is showed in the following table:
+
Several harvesters to import geospatial metadata (like ISO 19139 format) into CKAN from other sources have been created in gCube Data Catalogue.
 +
In particular all metadata created into gCube Geonetwork (GeoNetwork is the catalog application to manage spatially referenced resources generated into D4Science Infrastructure) are harvested through the '''Geonetwork Resolver'''.
 +
 
 +
Mapping (among fields) from an ISO19139 Metadata to Ckan Dataset via ckanext-geonetwork is shown in the following table:
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 813: Line 495:
 
! style="font-weight: bold;" | Ckan Dataset
 
! style="font-weight: bold;" | Ckan Dataset
 
|-
 
|-
| Title
+
| '''Title'''
 
| Title
 
| Title
 
|-
 
|-
| Description
+
| '''Description'''
 
| Description
 
| Description
 
|-
 
|-
 +
| '''bbox'''
 +
| spatial
 +
|-
 +
| style="font-style: italic;" | Descriptive Keywords
 
|  
 
|  
 +
|-
 +
| '''gmd:keyword'''
 +
| Tag
 +
|-
 +
|-
 
|  
 
|  
 +
| style="font-style: italic;" | Additional Info
 +
|-
 +
| metadata language, age,
 +
reference system, etc.
 +
| key/value
 +
|-
 
|-
 
|-
 
| style="font-style: italic;" | Digital Transfer Option
 
| style="font-style: italic;" | Digital Transfer Option
Line 828: Line 525:
 
|  
 
|  
 
|-
 
|-
| style="padding-left: 20px;" | gmd:url
+
| '''gmd:url'''
 
| URL
 
| URL
 
|-
 
|-
| style="padding-left: 20px;" | gmd:name
+
| '''gmd:name'''
 
| Name
 
| Name
 
|-
 
|-
| style="padding-left: 20px;" | gmd:description
+
| '''gmd:description'''
 
| Description
 
| Description
|-
 
|
 
|
 
|-
 
| style="font-style: italic;" | Descriptive Keywords
 
|
 
|-
 
| style="padding-left: 20px;" | gmd:keyword
 
| Tag
 
|-
 
|
 
| style="font-style: italic;" | Additional Info
 
|-
 
| bbox, metadata language, age,
 
reference system, etc.
 
| key/value
 
 
|}
 
|}
  
== gCube Data Catalogue: Geo Datasets ==
+
=== Geonetwork harvester for CKAN  ===
 +
 
 +
Geonetwork harvester for CKAN based on [https://github.com/geosolutions-it/ckanext-geonetwork ckanext-geonetwork] plugin has been enhanced to implement the D4Science and their communities needs.
 +
 
 +
The base configuration options to control the CKAN harvesters behaviour are available at [https://github.com/ckan/ckanext-harvest#the-ckan-harvester the-ckan-harvester]
 +
 
 +
Moreover, the following configuration options have been added to support specific D4Science requirements.
 +
 
 +
'''Add the "ITEM URL" field by default to harvested items''':
 +
 
 +
TODO
 +
 
 +
=== Geonetwork Resolver ===
 +
 
 +
The 'Geonetwork Resolver' is a "middle tier" that authorizing the CKAN's harvesters to CSW harvesting of ISO19139 items provided through the gCube Geonetworks.
 +
 
 +
See more at https://wiki.gcube-system.org/gcube/URI_Resolver#Geonetwork_Resolver
  
In order to make a dataset queryable by location (geospatial dataset), a special extra must be defined, with its key named ‘spatial’. The value must be a valid GeoJSON geometry, for example:
+
== Geo Datasets ==
 +
 
 +
In order to make a dataset queryable by Location (geospatial dataset), a reserved extra with the field name (the key) named ‘spatial’ must be defined. The value must be a '''valid GeoJSON geometry''', for example:
  
 
<pre>
 
<pre>
Line 895: Line 594:
 
</pre>
 
</pre>
  
Otherwise default bounding box is 4326. CKAN Wiki page for [http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html#legacy-api Legacy API]
+
Otherwise, the default bounding box is 4326. CKAN Wiki page for [http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html#legacy-api Legacy API]
  
Moreover, you can perform spatial queries using an integrated map widget in CKAN, which allows filtering results by an area of interest. You can try it on [https://ckan-d4s.d4science.org/dataset D4Science Data Catalogue]
+
Moreover, you can perform spatial queries using an integrated map widget available on CKAN, which allows filtering results by an area of interest. You can try it on [https://ckan-d4s.d4science.org/dataset D4Science Data Catalogue]
  
 
CKAN Wiki page for [http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html#spatial-search-widget Spatial Search Widget ]
 
CKAN Wiki page for [http://docs.ckan.org/projects/ckanext-spatial/en/latest/spatial-search.html#spatial-search-widget Spatial Search Widget ]
 +
 +
== Temporal Datasets ==
 +
 +
In order to make a dataset queryable by Time (temporal dataset), a reserved extra with the field name (the key) named ‘time_date’ must be defined. The value must be a valid '''ISO 18161 date'''.
 +
 +
== Users, Roles and Groups ==
 +
 +
Three roles are envisaged to capture the actions users are allowed to execute by the catalogue in the context of each VRE:
 +
 +
* '''''Catalogue_Member''''' - users with this role are allowed to:
 +
** View the organization’s private datasets.
 +
 +
* '''''Catalogue_Editor''''' - users with this role are allowed to:
 +
** View the organization’s private datasets;
 +
** Publish new datasets (into the organization);
 +
** Edit or delete the organization’s datasets the user is owner of.
 +
 +
* '''''Catalogue_Admin''''' - users with this role are allowed to:
 +
** View the organization’s private datasets;
 +
** Publish new datasets (into the organization);
 +
** Edit or delete any of the organization’s datasets;
 +
** Make datasets public or private.
 +
 +
* '''''Catalogue_Manager''''' - users with this role are allowed to:
 +
** View the organization’s private datasets;
 +
** Publish new datasets (into the organization);
 +
** Edit or delete any of the organization’s datasets;
 +
** Make datasets public or private;
 +
** Configure the catalogue.
 +
 
 +
The default role assigned to every VRE user is ''Catalogue_Member'', i.e. every user of a VRE is entitled to view the private datasets published in the VRE scope in addition to any public dataset. VRE Managers can assign other roles to selected users to enlarge their capabilities.
 +
 +
'''CKAN Groups''' can used to create and manage collections of datasets. This could be to catalogue datasets for a particular project or team, or on a particular theme, or as a very simple way to help people find and search your own published datasets.
 +
 +
== Dataset Update ==
 +
 +
As above presented only '''Catalogue_Editor''' (Editor) and '''Catalogue_Admin''' (Admin) can update existing dataset. The update can be performed by them via REST API (see: https://wiki.gcube-system.org/gcube/Catalogue_restful_service)
 +
 +
However, it is possible to edit a dataset also through the CKAN GUI (via Manage), but in this case If Editor or Admin need to edit records by adding/updating resources to them, they need to:
 +
 +
*1. store the file on the workspace;
 +
*2. get a public link to the that file;
 +
*3. edit the record by adding that link.
 +
 +
== Access the Catalogue via RESTful service ==
 +
 +
You can find more details at [[gCat Service | gCat Service]] page.
 +
 +
== Complex Query to Catalogue via gCube Catalogue Portlet ==
 +
 +
You can use two parameters `path` and `query` to perform complex query via HTTP provided by CKAN-engine also via ''gCube Catalogue Portlet''.
 +
 +
You need to use:
 +
 +
* the ''path'' parameter to specify the location which is a route/page rendered by CKAN-engine (e.g. dataset, organization, group);
 +
 +
* the ''query'' parameter to specify the query string (e.g. q=sarda). Its value must be encoded in BASE64.
 +
 +
For example:
 +
 +
* if someone is willing to refer to the catalogue items belonging to a catalogue group, he/she should specify path=/group/<<groupName>>
 +
 +
* if someone is willing to refer to all the catalogue items matching a given query, he/she should specify path=dataset and query=<<base64 query>>
 +
 +
Below a complete example.
 +
 +
Let's assume
 +
 +
* the CKAN instance is https://ckan-grsf.d4science.org
 +
* the 'gCube Catalogue Portlet' is in action at https://i-marine.d4science.org/web/grsf/data-catalogue
 +
 +
We want to perform the query https://ckan-grsf.d4science.org/dataset?q=sarda via ''gCube Catalogue Portlet''
 +
 +
* The ''path'' parameter must be: path=dataset
 +
* The ''query'' parameter must be: query=BASE64(q=sarda), that is query=cT1zYXJkYQ==
 +
 +
Thus the URL to perform the query via 'gCube Catalogue Portlet' will be:  https://i-marine.d4science.org/web/grsf/data-catalogue?path=dataset&query=cT1zYXJkYQ==
 +
 +
== Widget to show Catalogue Statistics: Catalogue Badge ==
 +
 +
You can find how to configure the ''Catalogue Badge'' Widget for a D4Scince infra-gateway at [https://wiki.gcube-system.org/gcube/ServiceManager_Guide#Catalogue_Badge Catalogue_Badge Service Manager] page.
 +
 +
== Related Issues ==
 +
 +
[1] gCube Data Catalogue for Global Record of Stocks and Fisheries - https://wiki.gcube-system.org/gcube/GCube_Data_Catalogue_for_GRSF
 +
 +
[2] Data Catalogue Resolver - https://wiki.gcube-system.org/gcube/URI_Resolver#CATALOGUE_Resolver
 +
 +
[4] How-to Purge a Catalogue Instance - https://wiki.gcube-system.org/gcube/How-to_purge_a_ckan_catalogue_instance

Latest revision as of 14:55, 9 March 2023

A catalogue is a service supporting its users to publish and search collections of descriptive information (metadata) for items including data, services, and related information objects.

D4Science offers services for seamless access and analysis to a wide spectrum of data including biological and ecological data, geospatial data, statistical data and semi-structured data from multiple authoritative data providers and information systems. These services can be exploited both via web based graphical user interfaces and web based protocols for programmatic access, e.g. OAI-PMH, CSW, WFS, SDMX. This offering nicely complements specific and community-specific applications. The gCube Data Catalogue catalogue contains a wealth of resources resulting from several activities, projects and communities including BlueBRIDGE (www.bluebridge-vres.eu/), i-Marine (www.i-marine.eu), SoBigData.eu (www.sobigdata.eu), and FAO (www.fao.org). All the products are accompanied with rich descriptions capturing general attributes, e.g. title and creator(s), as well as usage policies and licences.

The gCube Data Catalogue is built using and extending CKAN platform. CKAN is a powerful DMS (data management system) that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is an open-source DMS for powering data hubs and data portals. CKAN makes it easy to publish, share and use data see: http://ckan.org/

CKAN model is made by the following entities (and their relations):

CKAN: 'Entities and Relations'

Available Catalogues and their public locations

BLUEBRIDGE Catalogue

D4Science Catalogue

Metadata

A Metadata in the gCube Data Catalogue is made by two parts: CKAN's default metadata fields and gCube Metadata Profile.

CKAN's default metadata fields

Those are metadata fields common for all metadata types in the gCube Data Catalogue (and used by default in the CKAN platform).

Label Field Name (API) Definition Guidelines Example
Title* title Name given to the dataset. Short phrase, written in plain language. Should be sufficiently descriptive to allow for search and discovery. Aquaculture Production and Consumption in Africa (2011)
Description description Short description explaining the content and its origins. Description of a few sentences, written in plain language. Should,provide a sufficiently comprehensive overview of the resource for anyone,to understand its content, origins, and any continuing work on it. The,description can be written at the end, since it summarizes key,information from the other metadata fields. This dataset contains attributes of aquaculture production and,consumption for each of Africa’s provinces in 2011. The data was,provided by………
Tags tags An array of Taxonomic terms stored as tags Taxonomic terms Access to education, Bamboo
License* lincese_title the license that applies to published dataset.
Organization* organization Organization the datasets belongs to See list of organizations on

https://ckan-d-d4s.d4science.org/organization

D4Science
Version version Version of dataset Increase manually after editing 1.0
Author* Owner of dataset The person who created the dataset in the format: Surname, Name Bloggs, Joe
Author Contact Contact details of owner The email or other contact details of the person who created the dataset. joe@example.com
Mantainer Mantainer of the dataset The person or the authority that maintains the dataset A person: Bloggs, Joe. An authority: D4Science
Mantainer

Contact

Contact details of mantainer The email or other contact details of the person who maintains the dataset. joe@example.com

mandatory fields are marked with an asterisk (*)

gCube Metadata Profile

gCube Metadata Profile defines a Metadata schema XML-based for adding custom metadata fields.

A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) containing an ordered list of (at least one) Metadata Field (<metadatafield>). From version 3 a Metadata Field can contain also a reference (categoryref="category_id_#") to an entity "Category" using the Namespace of the Category (<namespace id="category_id_#">). Add a Category Reference to a Metadata Field means that the "field" belongs to the Category referred by Category Identifier (id="category_id_#). See Metadata Profile v.3. for more details.

Metadata Profile v.4

Metadata Profile v.4 is a XML file having the format:

<?xml version="1.0" encoding="UTF-8"?>
<metadataformat type="YOUR TYPE HERE">
    <metadatafield categoryref="category_id_#">
        <fieldId>ID of Metadata Field that identifies the field name in the Document (stored in the Service)</fieldId>
        <fieldName>Name of Metadata Field</fieldName>
        <mandatory>true|false</mandatory>
        <dataType>String|Time|Time_Interval|Times_ListOf|Text|Boolean|Number|GeoJSON</dataType>
        <maxOccurs>N|*</maxOccurs>
        <defaultValue>default value</defaultValue>
        <note>[the note is shown as a suggestion in the insert/update metadata form of Catalogue Publisher Widget]
		</note>
        <vocabulary isMultiSelection="true|false">
            <vocabularyField>field1</vocabularyField>
            <vocabularyField>field2</vocabularyField>
            <vocabularyField>field3</vocabularyField>
        </vocabulary>
        <validator>
            <regularExpression>a regular expression for validating values</regularExpression>
        </validator>
        <tagging create="true|false" separator="char_to_separate">onFieldName|onValue|onFieldName_onValue|onValue_onFieldName</tagging> 
        <grouping create="true|false">onFieldName|onValue|onFieldName_onValue|onValue_onFieldName</grouping>
    </metadatafield>
</metadataformat>


The <fieldId> is optional. It declares (if present in the profile) the value that will be used to specify the field name in the Document (e.g. JSON Document) passed to Service that will store the resulting Document. If the <fieldId> is absent in the profile, the value of the <fieldName> (which is mandatory) will be used as field name in the Document.

The <fieldName> field contains the name of the metadata field.

The <mandatory> field declares if the <metadatafield> is a field mandatory (by using 'true') or not (by using 'false').

DataType values:

The <dataType> field specifies the kind of data. A valid dataType must be equal to one of the values {String, Time, Time_Interval, Times_ListOf, Text, Boolean, Number, GeoJSON}. When the data type is not specified the metadata field has the default value "String". Temporal type: can be specified by using the value Time or Time_Interval or Times_ListOf (based on ISO 8601). Spatial type: can be specified by using the value GeoJSON.

In detail:

  • String: is a string;
  • Time: an instant time that follows the general format: YYYY-MM-DD [HH:MM] where: YYYY: 4-digit year, MM: 2-digit month, DD: 2-digit day, [optional HH: 2-digit hour], [optional MM: 2-digit minute] (e.g. "2005-03-01");
  • Time_Interval: a continuous interval instead of a single instant by specifying a start and end time, separated by one '/' ('slash') character (e.g. "2005-03-01/2006-05-11");
  • Times_ListOf: a list of discrete time values, separated by a ',' ('comma') character (e.g. "2005-03-01, 2006-05-11, 2006-05-11-2007-04-12");
  • Text: is a text;
  • Boolean: is True/False;
  • Number: is a valid Java number, see: Apache Commons NumberUtils.isNumber;
  • GeoJSON: is a string in the JSON format of kind GeoJSON (in particular it should contain a GeoJSON geometry). The GeoJSON is a format for encoding a variety of geographic data structures.
GeoSpatial Data (the spatial field):
In order to make a metadata a GeoSpatial Data and searchable by location via GeoSpatial Search Widget (see at #GeoSpatial_search_for_datasets:_via_API_or_Search_Widget), it must have a 'fieldName' named `spatial` with 'dataType' GeoJSON and a valid GeoJSON geometry as value.
E.g. A MedataField with GeoSpatial data:
    <metadatafield idref="category_id_#">
        <fieldName>spatial</fieldName>  <!--'spatial' is the reserved field name to assign a GeoSpatial dimension to metadata  -->
        <dataType>GeoJSON</dataType>
        <defaultValue>{"type": "Point","coordinates": [-20.145,74.078]}</defaultValue>
        <note>Please, insert a valid GeoJSON</note>
    </metadatafield>
see more details about #Geo_Datasets
Temporal Data (the time_date field):
In order to make a metadata a Temporal Data and searchable by time via Time Search Widget, it must have a 'fieldName' named `time_date` with 'dataType' Time and a valid ISO 8601 date as value.
E.g. A MedataField with Temporal data:
    <metadatafield idref="category_id_#">
        <fieldName>time_date</fieldName>  <!--'time_date' is the reserved field name to assign a Temporal dimension to metadata -->
        <dataType>Time</dataType>
        <defaultValue>2019-7-29</defaultValue>
        <note>Please, insert a valid ISO 8601 date</note>
    </metadatafield>
see more details about #Temporal_Datasets

maxOccurs Indicator:

The <maxOccurs> indicator specifies the maximum number of times that <metadatafield> can occur:

  • N (as number): if the field must appear N times;
  • * (as char asterisk): if the field can appear an unlimited number of times.

Categories as "Namespaces":

  • the Namespace of a Category declares a "class" for metadata fields having particular characteristics. It has been introduced in order to group metadata fields for categories and displaying them in a dedicated area through advanced GUI provided by CKAN D4Science plugin.

Namespaces (for Categories) are defined in an XML file made by one Namespaces element (<namespaces>) containing a list of (at least) one or many Namespace (<namespace>). The file has the format:

<?xml version="1.0" encoding="UTF-8"?>
<namespaces xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<namespace id="category_id_#">
		<name>Category Name</name>
		<title>Category Title</title>
		<description>This section is about Category description</description>
	</namespace>
</namespaces>

A Namespace element (<namespace>) has an attribute (id) and three entities. The attribute "id" must be unique in the file #Namespaces_Categories_schema:_NamespacesCatalogueCategories.xsd, it represents the category identifier for the Category. The elements are: name (is mandatory), title (is mandatory), description (is optional).

Metadata Field and Category Reference (categoryref="category_id_#"):

  • categoryref is an optional attribute. It is a unique id (id="category_id_#"). A metadata field can belong to only one Namespace of a Category referring it via idref (categoryref="id category to which metadata field belongs one").

Type of (meta)data (is Mandatory):

  • type: a Metadata Format (metadataformat) must have a unique 'type' (as a xml attribute) that declares a "type" for it. This mandatory information is saved as custom key (system:type="value of type") of the item stored in the Data Catalogue.

Tagging:

  • It is used by gCube Data Catalogue front-end for adding a metadata field as a Tag of the metadata. A Tag is a string between 2 and 100 characters long containing only alphanumeric characters and '-' (hyphen), '_' (underscore), . (dot). Tagging element in the Metadata Profile schema v3 must have a value equal to one of the values: {onFieldName, onValue, onFieldName_onValue, onValue_onFieldName}. Tagging values meanings:
    • onFieldName: (only) the fieldName specified to metadata field must be added as a Tag;
    • onValue: (only) the value specified to metadata field must be added as a Tag;
    • onFieldName_onValue: both the fieldName and the value (in this order) specified to metadata field must be added as a Tag. They are separated by string used as separator (<tagging create="true|false" separator="char_to_separate">{onFieldName_onValue}</tagging>);
    • onValue_onFieldName: both the value and the fieldName (in this order) specified to metadata field must be added as a Tag. They are separated by string used as separator (<tagging create="true|false" separator="char_to_separate">{onValue_onFieldName}</tagging>).
  • Moreover, Tagging has two (optional) attribute: 'create' and 'separator'. The first one (create="true"|"false") is used to mean: create the Tag if does not exist, no otherwise. The second one (separator="char_to_separate") is the string that will be used to separate the FieldName from its value. Default value for separator is the character '-' if it is not specified.

Tagging example: using following instance of metadata field

	<metadatafield categoryref="contact">
		<fieldName>Name</fieldName>
		<dataType>String</dataType>
		<defaultValue>My Name</defaultValue>
		<note>Insert your Name</note>
		<tagging create="true" separator="-">onFieldName_onValue</tagging>
	</metadatafield>

where My Name is "Francesco", gCube Data Catalogue adds the tag Name-Francesco to metadata field if it does not exist

Grouping:

  • It is used by Data Catalogue fron-end for adding a metadata field to a Group of Data Catalogue. Data Catalogue groups are used for browsing. Grouping element in the Metadata Profile schema v3 must have a value equal to one of the values: {onFieldName, onValue, onFieldName_onValue, onValue_onFieldName}. The (optional) attribute create="true" is used to mean: create the Group if does not exist, no otherwise. Grouping values meanings:
    • onFieldName: (only) the fieldName specified to metadata field must be added to a Group;
    • onValue: (only) the value specified to metadata field must be added to a Group;
    • onFieldName_onValue: both the fieldName and the value (in this order) specified to metadata field must be added to a Group (<grouping create="true|false">{onFieldName_onValue}</grouping>);
    • onValue_onFieldName: both the value and the fieldName (in this order) specified to metadata field must be added to a Group (<grouping create="true|false">{onValue_onFieldName}</grouping>).
  • Moreover, Grouping has one (optional) attribute: 'propagateUp' This property will let a user specify if an item, that is going to be added to that group, must be also added to the hierarchical chain of groups involving it. For instance, if we have group B as child of group A, and item I is going to be added to B, then it will also be added to A.
Metadata Profile schema: gcdcmetadataprofilev3.xsd

The gCube Data Catalogue Metadata Profile (v.3) schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified"
	elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
	<xs:include schemaLocation="NamespacesCatalogueCategories.xsd"/>
	<xs:element name="metadataformat">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="metadatafield" minOccurs="0" maxOccurs="unbounded">
					<xs:complexType>
						<xs:sequence>
							<xs:element type="xs:string" name="fieldName" />
							<xs:element type="xs:boolean" name="mandatory"
								minOccurs="0" maxOccurs="1" />
							<xs:element name="dataType">
								<xs:simpleType>
									<xs:restriction base="xs:string">
										<xs:enumeration value="String" />
										<xs:enumeration value="Time" />
										<xs:enumeration value="Time_Interval" />
										<xs:enumeration value="Times_ListOf" />
										<xs:enumeration value="Text" />
										<xs:enumeration value="Boolean" />
										<xs:enumeration value="Number" />
										<xs:enumeration value="GeoJSON" />
									</xs:restriction>
								</xs:simpleType>
							</xs:element>
							<xs:element type="xs:string" name="maxOccurs"
								minOccurs="0" maxOccurs="1" />
							<xs:element type="xs:string" name="defaultValue"
								minOccurs="0" maxOccurs="1" />
							<xs:element type="xs:string" name="note" minOccurs="0"
								maxOccurs="1" />
							<xs:element name="vocabulary" minOccurs="0" maxOccurs="1">
								<xs:complexType>
									<xs:sequence>
										<xs:element type="xs:string" name="vocabularyField"
											minOccurs="1" maxOccurs="unbounded" />
									</xs:sequence>
									<xs:attribute type="xs:boolean" name="isMultiSelection" />
								</xs:complexType>
							</xs:element>
							<xs:element name="validator" minOccurs="0" maxOccurs="1">
								<xs:complexType>
									<xs:sequence>
										<xs:element type="xs:string" name="regularExpression" />
									</xs:sequence>
								</xs:complexType>
							</xs:element>
							<xs:element name="tagging" type="TaggingType"
								minOccurs="0" maxOccurs="1">
							</xs:element>
							<xs:element name="grouping" type="GroupingType"
								minOccurs="0" maxOccurs="1">
							</xs:element>
						</xs:sequence>
						<xs:attribute name="categoryref" use="optional" type="xs:string" />
					</xs:complexType>
				</xs:element>
			</xs:sequence>
			<xs:attribute type="NotEmpty" use="required" name="type" />
		</xs:complexType>
	</xs:element>
	<xs:simpleType name="TaggingGroupingValue">
		<xs:restriction base="xs:string">
			<xs:enumeration value="onFieldName" />
			<xs:enumeration value="onValue" />
			<xs:enumeration value="onFieldName_onValue" />
			<xs:enumeration value="onValue_onFieldName" />
		</xs:restriction>
	</xs:simpleType>
	<xs:complexType name="TaggingType">
		<xs:simpleContent>
			<xs:extension base="TaggingGroupingValue">
				<xs:attribute type="xs:boolean" name="create" />
				<xs:attribute type="NotEmpty" name="separator" />
			</xs:extension>
		</xs:simpleContent>
	</xs:complexType>
	<xs:complexType name="GroupingType">
		<xs:simpleContent id="TaggingGroupingValue">
			<xs:extension base="TaggingGroupingValue">
				<xs:attribute type="xs:boolean" name="create" />
				<xs:attribute type="xs:boolean" name="propagateUp" />
			</xs:extension>
		</xs:simpleContent>
	</xs:complexType>
	<xs:simpleType name="NotEmpty">
		<xs:restriction base="xs:string">
			<xs:minLength value="1" />
		</xs:restriction>
	</xs:simpleType>
</xs:schema>

You can download it by clicking on Gcdcmetadataprofilev3.xsd

A "generic" example of MetadataProfile.xml:

<?xml version="1.0" encoding="UTF-8"?>
<metadataformat type="the_metadata_type" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="gcdcmetadataprofilev3.xsd">
  <metadatafield categoryref="idvalue0">
    <fieldName>fieldName</fieldName>
    <dataType>String</dataType>
    <defaultValue>defaultValue</defaultValue>
    <note>note</note>
    <vocabulary isMultiSelection="true">
      <vocabularyField>vocabularyField</vocabularyField>
    </vocabulary>
    <validator>
      <regularExpression>regularExpression</regularExpression>
    </validator>
    <tagging create="true" separator="-">onFieldName_onValue</tagging>
    <grouping create="true">onFieldName</grouping>
  </metadatafield>
</metadataformat>

Another example (of MetadataProfile.xml ) is the following one:

<?xml version="1.0" encoding="UTF-8"?>
<metadataformat type="the_metadata_type"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="gcdcmetadataprofilev3.xsd">
	<metadatafield categoryref="contact">
		<fieldName>Name</fieldName>
		<dataType>String</dataType>
		<defaultValue>My Name</defaultValue>
		<note>Insert your Name</note>
		<tagging create="true" separator="-">onFieldName_onValue</tagging>
	</metadatafield>
	<metadatafield categoryref="contact">
		<fieldName>Surname</fieldName>
		<dataType>String</dataType>
		<defaultValue>My Surname</defaultValue>
		<note>Insert your Surname</note>
	</metadatafield>
</metadataformat>
Namespaces Categories schema: NamespacesCatalogueCategories.xsd

The Namespaces Catalogue Categories schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified"
	elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:category="http://www.w3.org/TR/html4/">
	<xs:element name="namespaces">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="namespace" minOccurs="1" maxOccurs="unbounded">
					<xs:complexType>
						<xs:sequence>
							<xs:element type="xs:string" name="name" minOccurs="1"
								maxOccurs="1" />
							<xs:element type="xs:string" name="title" minOccurs="1"
								maxOccurs="1" />
							<xs:element type="xs:string" name="description"
								minOccurs="0" maxOccurs="1" />
						</xs:sequence>
						<xs:attribute type="xs:string" name="id" use="required" />
					</xs:complexType>
				</xs:element>
			</xs:sequence>
		</xs:complexType>
		<xs:unique name="unique-namespace-id">
			<xs:selector xpath="namespace" />
			<xs:field xpath="@id" />
		</xs:unique>
	</xs:element>
</xs:schema>

You can download it by clicking on NamespacesCatalogueCategories

An example of valid Namespaces.xml:

<?xml version="1.0" encoding="UTF-8"?>
<namespaces xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="NamespacesCatalogueCategories.xsd">
	<namespace id="contact">
		<name>Contact</name>
		<title>Contact Title</title>
		<description>This section is about Contact(s)</description>
	</namespace>
	<namespace id="developer_information">
		<name>Developer</name>
		<title>Developer Title</title>
		<description>This section is about Developer(s)</description>
	</namespace>
	<namespace id="extra_information">
		<name>Extra</name>
		<title>Extra Title</title>
		<description>This section is about Extra(s)</description>
	</namespace>
</namespaces>

Instances of Metadata Profile

The following list shows the instances of metadata profile created in the D4Sciece infrastructure and currently used in gCube Data Catalogue:

SoBigData.eu

SoBigData.eu: Metadata Profile for gCube Data Catalogue

Ckan Connector

The Ckan Connector Service is a gCube RESTFul service that allows infrastructure users to interact with CKAN.

It implements two methods:

  • /connect: creates a new CKAN session. The user can interact with CKAN while the session is alive (not expired or destroyed by the user).
  • /disconnect : destroys the current CKAN session.

Geo Harvesting

This extension contains plugins like ckanext-geonetwork (and others) which add geospatial capabilities to CKAN.

Several harvesters to import geospatial metadata (like ISO 19139 format) into CKAN from other sources have been created in gCube Data Catalogue. In particular all metadata created into gCube Geonetwork (GeoNetwork is the catalog application to manage spatially referenced resources generated into D4Science Infrastructure) are harvested through the Geonetwork Resolver.

Mapping (among fields) from an ISO19139 Metadata to Ckan Dataset via ckanext-geonetwork is shown in the following table:

ISO19139 Ckan Dataset
Title Title
Description Description
bbox spatial
Descriptive Keywords
gmd:keyword Tag
Additional Info
metadata language, age,

reference system, etc.

key/value
Digital Transfer Option Data and Resource
CI_OnlineResource
gmd:url URL
gmd:name Name
gmd:description Description

Geonetwork harvester for CKAN

Geonetwork harvester for CKAN based on ckanext-geonetwork plugin has been enhanced to implement the D4Science and their communities needs.

The base configuration options to control the CKAN harvesters behaviour are available at the-ckan-harvester

Moreover, the following configuration options have been added to support specific D4Science requirements.

Add the "ITEM URL" field by default to harvested items:

TODO

Geonetwork Resolver

The 'Geonetwork Resolver' is a "middle tier" that authorizing the CKAN's harvesters to CSW harvesting of ISO19139 items provided through the gCube Geonetworks.

See more at https://wiki.gcube-system.org/gcube/URI_Resolver#Geonetwork_Resolver

Geo Datasets

In order to make a dataset queryable by Location (geospatial dataset), a reserved extra with the field name (the key) named ‘spatial’ must be defined. The value must be a valid GeoJSON geometry, for example:

{
  "type":"Polygon",
  "coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]]
}

[Note: the polygon must be closed]

or

{
  "type": "Point",
  "coordinates": [-3.145,53.078]
}

GeoJSON Format Specification are available here: http://geojson.org/geojson-spec.html Datasets with spatial values are automatically geo-indexed, for example so that they can be searched using spatial filters.

GeoSpatial search for datasets: via API or Search Widget

Once your datasets are geo-indexed, you can perform spatial queries by bounding box (coordinates format is [LONG, LAT]), via the following API call:

/api/2/search/dataset/geo?bbox={minx,miny,maxx,maxy}[&crs={srid}]

If the bounding box coordinates are not in the same projection as the one defined in the database, a CRS must be provided, in one of the following forms:

    urn:ogc:def:crs:EPSG::4326
    EPSG:4326
    4326

Otherwise, the default bounding box is 4326. CKAN Wiki page for Legacy API

Moreover, you can perform spatial queries using an integrated map widget available on CKAN, which allows filtering results by an area of interest. You can try it on D4Science Data Catalogue

CKAN Wiki page for Spatial Search Widget

Temporal Datasets

In order to make a dataset queryable by Time (temporal dataset), a reserved extra with the field name (the key) named ‘time_date’ must be defined. The value must be a valid ISO 18161 date.

Users, Roles and Groups

Three roles are envisaged to capture the actions users are allowed to execute by the catalogue in the context of each VRE:

  • Catalogue_Member - users with this role are allowed to:
    • View the organization’s private datasets.
  • Catalogue_Editor - users with this role are allowed to:
    • View the organization’s private datasets;
    • Publish new datasets (into the organization);
    • Edit or delete the organization’s datasets the user is owner of.
  • Catalogue_Admin - users with this role are allowed to:
    • View the organization’s private datasets;
    • Publish new datasets (into the organization);
    • Edit or delete any of the organization’s datasets;
    • Make datasets public or private.
  • Catalogue_Manager - users with this role are allowed to:
    • View the organization’s private datasets;
    • Publish new datasets (into the organization);
    • Edit or delete any of the organization’s datasets;
    • Make datasets public or private;
    • Configure the catalogue.

The default role assigned to every VRE user is Catalogue_Member, i.e. every user of a VRE is entitled to view the private datasets published in the VRE scope in addition to any public dataset. VRE Managers can assign other roles to selected users to enlarge their capabilities.

CKAN Groups can used to create and manage collections of datasets. This could be to catalogue datasets for a particular project or team, or on a particular theme, or as a very simple way to help people find and search your own published datasets.

Dataset Update

As above presented only Catalogue_Editor (Editor) and Catalogue_Admin (Admin) can update existing dataset. The update can be performed by them via REST API (see: https://wiki.gcube-system.org/gcube/Catalogue_restful_service)

However, it is possible to edit a dataset also through the CKAN GUI (via Manage), but in this case If Editor or Admin need to edit records by adding/updating resources to them, they need to:

  • 1. store the file on the workspace;
  • 2. get a public link to the that file;
  • 3. edit the record by adding that link.

Access the Catalogue via RESTful service

You can find more details at gCat Service page.

Complex Query to Catalogue via gCube Catalogue Portlet

You can use two parameters `path` and `query` to perform complex query via HTTP provided by CKAN-engine also via gCube Catalogue Portlet.

You need to use:

  • the path parameter to specify the location which is a route/page rendered by CKAN-engine (e.g. dataset, organization, group);
  • the query parameter to specify the query string (e.g. q=sarda). Its value must be encoded in BASE64.

For example:

  • if someone is willing to refer to the catalogue items belonging to a catalogue group, he/she should specify path=/group/<<groupName>>
  • if someone is willing to refer to all the catalogue items matching a given query, he/she should specify path=dataset and query=<<base64 query>>

Below a complete example.

Let's assume

We want to perform the query https://ckan-grsf.d4science.org/dataset?q=sarda via gCube Catalogue Portlet

  • The path parameter must be: path=dataset
  • The query parameter must be: query=BASE64(q=sarda), that is query=cT1zYXJkYQ==

Thus the URL to perform the query via 'gCube Catalogue Portlet' will be: https://i-marine.d4science.org/web/grsf/data-catalogue?path=dataset&query=cT1zYXJkYQ==

Widget to show Catalogue Statistics: Catalogue Badge

You can find how to configure the Catalogue Badge Widget for a D4Scince infra-gateway at Catalogue_Badge Service Manager page.

Related Issues

[1] gCube Data Catalogue for Global Record of Stocks and Fisheries - https://wiki.gcube-system.org/gcube/GCube_Data_Catalogue_for_GRSF

[2] Data Catalogue Resolver - https://wiki.gcube-system.org/gcube/URI_Resolver#CATALOGUE_Resolver

[4] How-to Purge a Catalogue Instance - https://wiki.gcube-system.org/gcube/How-to_purge_a_ckan_catalogue_instance