Difference between revisions of "GCat Background"
(→gCube Metadata Profile v1) |
(→Metadata Profile v1 (ongoing)) |
||
Line 93: | Line 93: | ||
A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) that contains one or many (<metadatafield>). The schema is the following: | A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) that contains one or many (<metadatafield>). The schema is the following: | ||
− | ===== Metadata Profile | + | ===== Metadata Profile v.1 (ongoing) ===== |
<pre> | <pre> | ||
<?xml version="1.0" encoding="UTF-8"> | <?xml version="1.0" encoding="UTF-8"> |
Revision as of 11:06, 13 September 2016
** THIS DOCUMENT IS A DRAFT **
gCube Data Catalogue.... using CKAN.
CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data see: http://ckan.org/
Contents
gCube Data Catalogue: Metadata
A Metadata in the gCube Data Catalogue is made by two parts: CKAN's default metadata fields and gCube Metadata Profile.
CKAN's default metadata fields
Those are metadata fields common for all metadata types in the gCube Data Catalogue (and used by default in the CKAN platform).
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Title* | title | Name given to the dataset. | Short phrase, written in plain language. Should be sufficiently descriptive to allow for search and discovery. | Aquaculture Production and Consumption in Africa (2011) |
Description | description | Short description explaining the content and its origins. | Description of a few sentences, written in plain language. Should,provide a sufficiently comprehensive overview of the resource for anyone,to understand its content, origins, and any continuing work on it. The,description can be written at the end, since it summarizes key,information from the other metadata fields. | This dataset contains attributes of aquaculture production and,consumption for each of Africa’s provinces in 2011. The data was,provided by……… |
Tags | tags | An array of Taxonomic terms stored as tags | Taxonomic terms | Access to education, Bamboo |
License* | lincese_title | the license that applies to published dataset. | ||
Organization* | organization | Organization the datasets belongs to | See list of organizations on | D4Science |
Version | version | Version of dataset | Increase manually after editing | 1.0 |
Author* | Owner of dataset | The person who created the dataset in the format: Surname, Name | Bloggs, Joe | |
Author Contact | Contact details of owner | The email or other contact details of the person who created the dataset. | joe@example.com | |
Mantainer | Mantainer of the dataset | The person or the authority that maintains the dataset | A person: Bloggs, Joe. An authority: D4Science | |
Mantainer
Contact |
Contact details of mantainer | The email or other contact details of the person who maintains the dataset. | joe@example.com |
mandatory fields are marked with an asterisk (*)
gCube Metadata Profile v.1
gCube Metadata Profile defines a Metadata schema XML-based for adding custom metadata fields.
A gCube Metadata Profile is composed by one Metadata Format (<metadataformat>) that contains one or many (<metadatafield>). The schema is the following:
Metadata Profile v.1 (ongoing)
<?xml version="1.0" encoding="UTF-8"> <metadataformat> <metadatafield> <fieldName>Name</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue>default value</defaulValue> <note>shown as suggestions in the insert/update metadata form of CKAN</note> <vocabulary> <vocabularyField>field1</vocabularyField> <vocabularyField>field2</vocabularyField> <!-- ... others vocabulary fields --> </vocabulary> <validator> <regularExpression>a regular expression for validating values</regularExpression> </validator> </metadatafield> <!-- ... others metadata fields --> </metadataformat>
It's possible to validate a Metadata Format schema using following DTD v1
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT metadataformat (metadatafield+)> <!ELEMENT metadatafield (fieldName, mandatory, isBoolean?, defaulValue?, note?, vocabulary?, validator?)> <!ELEMENT fieldName (#PCDATA)> <!ELEMENT mandatory (#PCDATA)> <!ELEMENT isBoolean (#PCDATA)> <!-- MUST BE (true|false) --> <!ELEMENT defaulValue (#PCDATA)> <!ELEMENT note (#PCDATA)> <!ELEMENT vocabulary (vocabularyField+)> <!ELEMENT vocabularyField (#PCDATA)> <!ELEMENT validator (regularExpression)> <!ELEMENT regularExpression (#PCDATA)>
A possible instance of Metadata Field (<metadatafield>):
<metadatafield> <fieldName>Accessibility</fieldName> <mandatory>true</mandatory> <defaulValue>virtual/public</defaulValue> <vocabulary> <vocabularyField>virtual/public</vocabularyField> <vocabularyField>virtual/private</vocabularyField> <vocabularyField>transactional</vocabularyField> </vocabulary> </metadatafield>
Metadata Profile v.2 (coming soon)
In this version:
- Added datatype field (<datatype>). A valid datatype must be equal to one of the values {String, Time, Time_Interval, Times_ListOf, Text, Boolean, Number}. When data type is not specified the metadata field has default value as "String". DataType values:
- String: is a string;
- Time: an instant time that follows the general format: YYYY-MM-DD [HH:MM] where: YYYY: 4-digit year, MM: 2-digit month, DD: 2-digit day, [optional HH: 2-digit hour], [optional MM: 2-digit minute] (e.g. "2005-03-01");
- Time_Interval: a continuous interval instead of a single instant by specifying a start and end time, separated by one '/' ('slash') character (e.g. "2005-03-01/2006-05-11");
- Times_ListOf: a list of discrete time values, separated by a ',' ('comma') character (e.g. "2005-03-01, 2006-05-11, 2006-05-11-2007-04-12");
- Text: is a text;
- Boolean: is True/False;
- Number: is a valid Java number, see: Apache Commons NumberUtils.isNumber.
- Added multi selection attribute ('isMultiSelection=true|false') to vocabulary.
<?xml version="1.0" encoding="UTF-8"?> <metadataformat> <metadatafield> <fieldName>Name</fieldName> <mandatory>true</mandatory> <datatype>String|Time|Time_Interval|Times_ListOf|Text|Boolean|Number</datatype> <defaulValue>default value</defaulValue> <note>shown as suggestions in the insert/update metadata form of CKAN </note> <vocabulary isMultiSelection="true|false"> <vocabularyField>field1</vocabularyField> <vocabularyField>field2</vocabularyField> <vocabularyField>field3</vocabularyField> </vocabulary> <validator> <regularExpression>a regular expression for validating values </regularExpression> </validator> </metadatafield> </metadataformat>
It's possible to validate a Metadata Format schema using following DTD v2
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT metadataformat (metadatafield+)> <!ELEMENT metadatafield (fieldName, mandatory, datatype?, defaulValue?, note?, vocabulary?, validator?)> <!ELEMENT fieldName (#PCDATA)> <!ELEMENT mandatory (#PCDATA)> <!ELEMENT datatype (#PCDATA)> <!ELEMENT defaulValue (#PCDATA)> <!ELEMENT note (#PCDATA)> <!ELEMENT vocabulary (vocabularyField+)> <!ATTLIST vocabulary isMultiSelection (true|false) "false"> <!ELEMENT vocabularyField (#PCDATA)> <!ELEMENT validator (regularExpression)> <!ELEMENT regularExpression (#PCDATA)> <!-- Where datatype element is the enum: {String, Time, Time_Interval, Times_ListOf, Text, Boolean, Number} Your xml schema: <xs:element name="datatype"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="String"/> <xs:enumeration value="Time"/> <xs:enumeration value="Time_Interval"/> <xs:enumeration value="Times_ListOf"/> <xs:enumeration value="Text"/> <xs:enumeration value="Boolean"/> <xs:enumeration value="Number"/> </xs:restriction> </xs:simpleType> </xs:element> -->
SoBigData.eu: Dataset Metadata
The current list of fields characterising a SoBigData resource is available at https://docs.google.com/spreadsheets/d/1kuhvmDVKpmqt2foyCB9wDo3HgzoAiCuRQ8CjRS-DVOM/edit?usp=sharing
The following fields have been identified:
Field | In Catalogue |
---|---|
Internal Fields | |
Internal Identifier | Automatically created |
Creation Date | Automatically created |
Last Modification | Automatically updated |
General Description | |
Title | Title |
Identifier |
<fieldName>External Identifier</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>This applies only to datasets that have been already published. Insert here a DOI, an handle, and any other Identifier assigned when publishing the dataset alsewhere.</note> <vocabulary></vocabulary> <validator></validator> |
Creators | Author is there, unfortunately there is only one author per Dataset. Moreover, the technology supports only key value pairs ... no complex types.
<fieldName>Creator</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The name of the creator, with email and ORCID. The format should be: family, given[, email][, ORCID]. Example: Smith, John, js@acme.org, orcid.org//0000-0002-1825-0097 </note> <vocabulary></vocabulary> <validator> <regularExpression>^[a-zA-Z .'-]+, [a-zA-Z .'-]+[, ]*([a-zA-Z0-9_!#$%’*+=?`{|}~^.-]+@[a-zA-Z0-9.-]+)?[, ]*(orcid.org\/\/0000-000(1-[5-9]|2-[0-9]|3-[0-4])\d\d\d-\d\d\d[\dX])?$</regularExpression> </validator> |
Creation Date |
<fieldName>CreationDate</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The date of creation of the dataset (different from the date of registration of the dataset automatically added by the system). Use ISO 8601 Date Format: YYYY-MM-DD[ HH:MM] Ex. 1998-11-10 or 2015-05-29 11:55 </note> <vocabulary></vocabulary> <validator> <regularExpression>^(\d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01]))+([ ]+(\d{2}(:?\d{2})?)?)?$</regularExpression> </validator> |
Distributor | Maintainer |
Publisher |
Author |
Publication Date | when the dataset is published in the repository ... no field have to be specified; |
Contact | Maintainer email |
Thematic Cluster |
<fieldName>ThematicCluster</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The SoBigData.eu Thematic Clusters </note> <vocabulary> <vocabularyField>Text and Social Media Mining</vocabularyField> <vocabularyField>Social Network Analysis</vocabularyField> <vocabularyField>Human Mobility Analytics</vocabularyField> <vocabularyField>Web Analytics</vocabularyField> <vocabularyField>Visual Analytics</vocabularyField> <vocabularyField>Social Data</vocabularyField> </vocabulary> <validator></validator> |
Area |
<fieldName>Area</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Sub-community specific</note> <vocabulary></vocabulary> <validator></validator> |
Semantic Coverage |
<fieldName>Semantic Coverage</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Tagging e.g. people, cities, transports...</note> <vocabulary></vocabulary> <validator></validator> |
Time Coverage Start Date |
<fieldName>TimeCoverage</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>List of time intervals, e.g. 1977-03-10 11:45 - 2005-01-15 09:10; 2010-03-10 00:00 - 2015-01-15 10:00</note> <vocabulary></vocabulary> <validator></validator> |
Time Coverage End Date | not needed see above |
Geo Location |
<fieldName>spatial</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The value must be a valid GeoJSON geometry, for example: { "type":"Polygon", "coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]] } or: { "type": "Point", "coordinates": [-3.145,53.078] } </note> <vocabulary></vocabulary> <validator></validator> More on GeoJSON geometry. |
ProcessingDegree |
Shall we go for a Topic too? I think so. <fieldName>ProcessingDegree</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether primary or secondary dataset. </note> <vocabulary> <vocabularyField>Primary</vocabularyField> <vocabularyField>Secondary</vocabularyField> </vocabulary> <validator></validator> |
ManifestationType |
<fieldName>ManifestationType</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Virtual (accessible in streaming from remote sites), replica (copy of data in remote sites, e.g. DBPL), original (collection of data produced and kept in local infra by data provider). </note> <vocabulary> <vocabularyField>Virtual</vocabularyField> <vocabularyField>Replica</vocabularyField> <vocabularyField>Original</vocabularyField> </vocabulary> <validator></validator> |
Language |
<fieldName>Language</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The primary language of the resource (using ISO 639-3). You can see ISO 639 Code Tables here: http://www-01.sil.org/iso639-3/codes.asp </note> <vocabulary> <vocabularyField>Abai Sungai, abf</vocabularyField> <vocabularyField>Abanyom, abm</vocabularyField> <vocabularyField>Abar, mij</vocabularyField> <vocabularyField>Abau, aau</vocabularyField> <vocabularyField>Abaza, abq</vocabularyField> <vocabularyField>Abé, aba</vocabularyField> <vocabularyField>Abellen Ayta, abp</vocabularyField> <vocabularyField>Abidji, abi</vocabularyField> <vocabularyField>Abinomn, bsa</vocabularyField> <vocabularyField>Abipon, axb</vocabularyField> <vocabularyField>Abishira, ash</vocabularyField> <vocabularyField>Abkhazian, abk</vocabularyField> <vocabularyField>Abom, aob</vocabularyField> <vocabularyField>Abon, abo</vocabularyField> <vocabularyField>Abron, abr</vocabularyField> <vocabularyField>Abu, ado</vocabularyField> <vocabularyField>Abu' Arapesh, aah</vocabularyField> <vocabularyField>Abua, abn</vocabularyField> etc.. etc.. </vocabulary> <validator></validator> |
Description | Description |
RelatedLiterature |
<fieldName>RelatedPaper</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Insert a complete reference to an associated work. </note> <vocabulary></vocabulary> <validator></validator> |
RelatedDataset | TBD |
Accessibility properties | |
Accessibility |
<fieldName>Accessibility</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>How the access to the resource is regulated: Virtual Access or Trans National Access. </note> <vocabulary> <vocabularyField>Both</vocabularyField> <vocabularyField>Virtual Access</vocabularyField> <vocabularyField>Trans National Access</vocabularyField> </vocabulary> <validator></validator> |
AccessibilityMode |
<fieldName>AccessibilityMode</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>How the access to the resource is offered. </note> <vocabulary> <vocabularyField>OnLine Access</vocabularyField> <vocabularyField>API Access</vocabularyField> <vocabularyField>Download</vocabularyField> </vocabulary> <validator></validator> |
Privacy | TBD |
Technical properties | |
Size |
<fieldName>Size</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whatever “size” means in your domain/mind </note> <vocabulary> </vocabulary> <validator></validator> |
DiskSize |
<fieldName>DiskSize</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>In MB </note> <vocabulary> </vocabulary> <validator></validator> |
Format |
<fieldName>Format</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>MIME or extension </note> <vocabulary> </vocabulary> <validator></validator> |
FormatSchema |
<fieldName>FormatSchema</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Link to Schema </note> <vocabulary> </vocabulary> <validator></validator> |
API | |
Legally and Ethical Aspects | |
Personal Data |
<fieldName>Personal Data</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The dataset contains personal data?</note> <vocabulary> <vocabularyField>No</vocabularyField> <vocabularyField>Yes</vocabularyField> </vocabulary> <validator></validator> |
Personal sensitive data |
<fieldName>Personal Sensitive Data</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The dataset contains personal sensitive data?</note> <vocabulary> <vocabularyField>No</vocabularyField> <vocabularyField>Yes</vocabularyField> <vocabularyField>N/A (Not Available)</vocabularyField> </vocabulary> <validator></validator> |
Data set contains data of children |
<fieldName>ChildrenData</fieldName> <mandatory>true</mandatory> <isBoolean>true</isBoolean> <defaulValue></defaulValue> <note>The dataset contains children data?</note> <vocabulary> </vocabulary> <validator></validator> |
Consent of the data subject |
<fieldName>Consent of the data subject</fieldName> <mandatory>true</mandatory> <isBoolean>true</isBoolean> <defaulValue></defaulValue> <note>Consent of the data subject. Data subject signifies his agreement to personal data relating to him being processed</note> <vocabulary> </vocabulary> <validator></validator> |
Consent obtained also covers the envisaged transfer of the personal data outside the EU |
<fieldName>Consent obtained also covers...</fieldName> <mandatory>true</mandatory> <isBoolean>true</isBoolean> <defaulValue></defaulValue> <note>Consent obtained also covers the envisaged transfer of the personal data outside the EU</note> <vocabulary> </vocabulary> <validator></validator> |
Personal data was manifestly made public by the data subject |
<fieldName>Personal data was manifestly...</fieldName> <mandatory>true</mandatory> <isBoolean>true</isBoolean> <defaulValue></defaulValue> <note>Personal data was manifestly made public by the data subject</note> <vocabulary> </vocabulary> <validator></validator> |
Data Protection Directive |
<fieldName>Data Protection Directive</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Report the law or protocol number and the institution related to Data Protection</note> <vocabulary> </vocabulary> <validator></validator> |
Intellectual properties | |
IP/Copyrights |
<fieldName>IP/Copyrights</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether dataset is covered by any rights: copyright, related rights, database right, know how, proprietary, etc.</note> <vocabulary> </vocabulary> <validator></validator> |
Link to the source | Resource |
License | License |
Link to the license | Automatic |
Field/Scope of use |
<fieldName>Field/Scope of use</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note></note> <vocabulary> <vocabularyField>Any use</vocabularyField> <vocabularyField>Non-commercial only</vocabularyField> <vocabularyField>Research only</vocabularyField> <vocabularyField>Non-commercial research only</vocabularyField> <vocabularyField>Private use</vocabularyField> <vocabularyField>Use for developing and providing a service</vocabularyField> </vocabulary> <validator></validator> |
Basic rights |
<fieldName>Basic rights</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note></note> <vocabulary> <vocabularyField>Temporary download of a single copy only</vocabularyField> <vocabularyField>Download</vocabularyField> <vocabularyField>Copying</vocabularyField> <vocabularyField>Distribution</vocabularyField> <vocabularyField>Modification</vocabularyField> <vocabularyField>Communication</vocabularyField> <vocabularyField>Making available to the public</vocabularyField> <vocabularyField>Other rights</vocabularyField> </vocabulary> <validator></validator> |
Restrictions on use |
<fieldName>Restrictions on use</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Any restrictions on how where the dataset may be used </note> <vocabulary> </vocabulary> <validator></validator> |
Prohibited actions | |
Sublicense rights |
<fieldName>Sublicense rights</fieldName> <mandatory>true</mandatory> <isBoolean>true</isBoolean> <defaulValue></defaulValue> <note>Any restrictions on how where the dataset may be used</note> <vocabulary> <vocabularyField>No</vocabularyField> <vocabularyField>Yes</vocabularyField> </vocabulary> <validator></validator> |
Attribution requirements |
<fieldName>Attribution requirements</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The text exporting how the user needs to acknowledge the source when using/distributing data/developing service</note> <vocabulary> </vocabulary> <validator></validator> |
Display requirements |
<fieldName>Display requirements</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether the user, when displaying the dataset in any media or form, must follow certain display requirements, e.g. attach copyright notice</note> <vocabulary> </vocabulary> <validator></validator> |
Distribution requirements |
<fieldName>Distribution requirements</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether the user, when distributing the dataset, if allowed, must follow certain requirements</note> <vocabulary> </vocabulary> <validator></validator> |
Territory of use |
<fieldName>Territory of use</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>In what territory dataset may be used</note> <vocabulary> <vocabularyField>World Wide</vocabularyField> <vocabularyField>Europe</vocabularyField> <vocabularyField>Albania</vocabularyField> <vocabularyField>Andorra</vocabularyField> <vocabularyField>Austria</vocabularyField> <vocabularyField>Belarus</vocabularyField> <vocabularyField>Belgium</vocabularyField> <vocabularyField>Bosnia and Herzegovina</vocabularyField> <vocabularyField>Bulgaria</vocabularyField> <vocabularyField>Croatia</vocabularyField> <vocabularyField>Cyprus</vocabularyField> <vocabularyField>Czech Republic</vocabularyField> <vocabularyField>Denmark</vocabularyField> <vocabularyField>Estonia</vocabularyField> <vocabularyField>Faroe Is.</vocabularyField> <vocabularyField>Finland</vocabularyField> <vocabularyField>France</vocabularyField> <vocabularyField>Germany</vocabularyField> <vocabularyField>Gibraltar</vocabularyField> <vocabularyField>Greece</vocabularyField> <vocabularyField>Guernsey</vocabularyField> <vocabularyField>Hungary</vocabularyField> <vocabularyField>Iceland</vocabularyField> <vocabularyField>Ireland</vocabularyField> <vocabularyField>Italy</vocabularyField> <vocabularyField>Latvia</vocabularyField> <vocabularyField>Liechtenstein</vocabularyField> <vocabularyField>Lithuania</vocabularyField> <vocabularyField>Luxembourg</vocabularyField> <vocabularyField>Macedonia</vocabularyField> <vocabularyField>Malta</vocabularyField> <vocabularyField>Moldova</vocabularyField> <vocabularyField>Monaco</vocabularyField> <vocabularyField>Montenegro</vocabularyField> <vocabularyField>Netherlands</vocabularyField> <vocabularyField>Norway</vocabularyField> <vocabularyField>Poland</vocabularyField> <vocabularyField>Portugal</vocabularyField> <vocabularyField>Romania</vocabularyField> <vocabularyField>San Marino</vocabularyField> <vocabularyField>Serbia</vocabularyField> <vocabularyField>Slovakia</vocabularyField> <vocabularyField>Slovenia</vocabularyField> <vocabularyField>Spain</vocabularyField> <vocabularyField>Sweden</vocabularyField> <vocabularyField>Switzerland</vocabularyField> <vocabularyField>United Kingdom</vocabularyField> <vocabularyField>Ukraine</vocabularyField> <vocabularyField>Vatican City</vocabularyField> </vocabulary> <validator></validator> |
License term |
<fieldName>License term</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Period of time during which the dataset may be used. Use ISO 8601 Date Format: YYYY-MM-DD[ HH:MM] Ex. 1998-11-10 or 2015-05-29 11:55</note> <vocabulary></vocabulary> <validator> <regularExpression>^(\d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01]))+([ ]+(\d{2}(:?\d{2})?)?)?$</regularExpression> </validator> |
Requirement of non-disclosure
(confidentiality mark) |
<fieldName>Requirement of non-disclosure (confidentiality mark)</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Requirement of non-disclosure (confidentiality mark). Whether the dataset bears confidentiality mark/may be used and shared subject to the obligation of non-disclosure</note> <vocabulary> </vocabulary> <validator></validator> |
SoBigData.eu: Method Metadata
The current list of fields characterising a SoBigData resource is available at https://docs.google.com/spreadsheets/d/1kuhvmDVKpmqt2foyCB9wDo3HgzoAiCuRQ8CjRS-DVOM/edit?usp=sharing
The following fields have been identified:
Field | In Catalogue |
---|---|
Internal Fields | |
Internal Identifier | Automatically created |
Creation Date | Automatically created |
Last Modification | Automatically updated |
General Description | |
Title | Title |
Identifier |
<fieldName>External Identifier</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>This applies only to methods that have been already published. Insert here a DOI, an handle, and any other Identifier assigned when publishing the dataset alsewhere.</note> <vocabulary></vocabulary> <validator></validator> |
Creators | Author is there, unfortunately there is only one author per item. Moreover, the technology supports only key value pairs ... no complex types.
<fieldName>Creator</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The name of the creator, with email and ORCID. The format should be: family, given[, email][, ORCID]. Example: Smith, John, js@acme.org, orcid.org//0000-0002-1825-0097 </note> <vocabulary></vocabulary> <validator> <regularExpression>^[a-zA-Z .'-]+, [a-zA-Z .'-]+[, ]*([a-zA-Z0-9_!#$%’*+=?`{|}~^.-]+@[a-zA-Z0-9.-]+)?[, ]*(orcid.org\/\/0000-000(1-[5-9]|2-[0-9]|3-[0-4])\d\d\d-\d\d\d[\dX])?$</regularExpression> </validator> |
Creation Date |
<fieldName>CreationDate</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The date of creation of the dataset (different from the date of registration of the dataset automatically added by the system). Use ISO 8601 Date Format: YYYY-MM-DD[ HH:MM] Ex. 1998-11-10 or 2015-05-29 11:55 </note> <vocabulary></vocabulary> <validator> <regularExpression>^(\d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01]))+([ ]+(\d{2}(:?\d{2})?)?)?$</regularExpression> </validator> |
Distributor | Maintainer |
Owner |
<fieldName>Owner</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The name of the owner, with email and ORCID. The format should be: family, given[, email][, ORCID]. Example: Smith, John, js@acme.org, orcid.org//0000-0002-1825-0097 </note> <vocabulary></vocabulary> <validator> <regularExpression>^[a-zA-Z .'-]+, [a-zA-Z .'-]+[, ]*([a-zA-Z0-9_!#$%’*+=?`{|}~^.-]+@[a-zA-Z0-9.-]+)?[, ]*(orcid.org\/\/0000-000(1-[5-9]|2-[0-9]|3-[0-4])\d\d\d-\d\d\d[\dX])?$</regularExpression> </validator> |
Publication Date | when the method is published in the catalogue ... no field have to be specified; |
Contact | Maintainer email |
Thematic Cluster |
Shall we go for a Topic too? I think so. <fieldName>ThematicCluster</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The SoBigData.eu Thematic Clusters </note> <vocabulary> <vocabularyField>Text and Social Media Mining</vocabularyField> <vocabularyField>Social Network Analysis</vocabularyField> <vocabularyField>Human Mobility Analytics</vocabularyField> <vocabularyField>Web Analytics</vocabularyField> <vocabularyField>Visual Analytics</vocabularyField> <vocabularyField>Social Data</vocabularyField> </vocabulary> <validator></validator> |
Area |
<fieldName>Area</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Sub-community specific</note> <vocabulary></vocabulary> <validator></validator> |
Semantic Coverage |
<fieldName>Semantic Coverage</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Tagging e.g. people, cities, transports...</note> <vocabulary></vocabulary> <validator></validator> |
Usage mode |
<fieldName>UsageMode</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>How the method is expected to be accessed </note> <vocabulary> <vocabularyField>Download</vocabularyField> <vocabularyField>as-a-Service by SoBigData Infrastructure</vocabularyField> <vocabularyField>as-a-Service by third party infrastructure</vocabularyField> </vocabulary> <validator></validator> |
methodURL | As a Resource |
documentationURL | As a Resource |
inputParametersType |
<fieldName>input</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>See WPS </note> <vocabulary> </vocabulary> <validator></validator> |
outputType |
<fieldName>output</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>See WPS </note> <vocabulary> </vocabulary> <validator></validator> |
Description | Description |
RelatedLiterature |
<fieldName>RelatedPaper</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Insert a complete reference to an associated work. </note> <vocabulary></vocabulary> <validator></validator> |
RelatedDataset | TBD |
RelatedMethod | TBD |
Accessibility properties | |
Accessibility |
<fieldName>Accessibility</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>How the access to the resource is regulated: Virtual Access or Trans National Access. </note> <vocabulary> <vocabularyField>Both</vocabularyField> <vocabularyField>Virtual Access</vocabularyField> <vocabularyField>Trans National Access</vocabularyField> </vocabulary> <validator></validator> |
AccessibilityMode |
<fieldName>AccessibilityMode</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>How the access to the resource is offered. </note> <vocabulary> <vocabularyField>OnLine Access</vocabularyField> <vocabularyField>API Access</vocabularyField> <vocabularyField>Download</vocabularyField> </vocabulary> <validator></validator> |
Technical properties | |
Programming Language |
<fieldName>ProgrammingLanguage</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The primary language used to implement the method. </note> <vocabulary></vocabulary> <validator></validator> |
Hosting Environment |
<fieldName>Hosting Environment</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>E.g. Linux, Microsoft Azure, Amazon EC2 </note> <vocabulary></vocabulary> <validator></validator> |
Source code | As a Resource |
Artifact repository | As a Resource |
Dependencies on Other SW |
<fieldName>Dependencies on Other SW</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>E.g. this sowftware requires an Hadoop cluster to run </note> <vocabulary></vocabulary> <validator></validator> |
Intellectual properties | |
IP/Copyrights |
<fieldName>IP/Copyrights</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether software is covered by any rights: copyright, related rights, know how, proprietary, etc.</note> <vocabulary> </vocabulary> <validator></validator> |
License | License |
Link to the license | Automatic |
Field/Scope of use |
<fieldName>Field/Scope of use</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note></note> <vocabulary> <vocabularyField>Any use</vocabularyField> <vocabularyField>Non-commercial only</vocabularyField> <vocabularyField>Research only</vocabularyField> <vocabularyField>Non-commercial research only</vocabularyField> <vocabularyField>Private use</vocabularyField> <vocabularyField>Use for developing and providing a service</vocabularyField> </vocabulary> <validator></validator> |
Basic rights |
<fieldName>Basic rights</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note></note> <vocabulary> <vocabularyField>Temporary download of a single copy only</vocabularyField> <vocabularyField>Download</vocabularyField> <vocabularyField>Copying</vocabularyField> <vocabularyField>Distribution</vocabularyField> <vocabularyField>Modification</vocabularyField> <vocabularyField>Communication</vocabularyField> <vocabularyField>Making available to the public</vocabularyField> <vocabularyField>Other rights</vocabularyField> </vocabulary> <validator></validator> |
Restrictions on use |
<fieldName>Restrictions on use</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Any restrictions on how where the dataset may be used </note> <vocabulary> </vocabulary> <validator></validator> |
Prohibited actions | |
Sublicense rights |
<fieldName>Sublicense rights</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Any restrictions on how where the dataset may be used</note> <vocabulary> <vocabularyField>No</vocabularyField> <vocabularyField>Yes</vocabularyField> </vocabulary> <validator></validator> |
Attribution requirements |
<fieldName>Attribution requirements</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>The text exporting how the user needs to acknowledge the source when using/distributing data/developing service</note> <vocabulary> </vocabulary> <validator></validator> |
Display requirements | |
Distribution requirements |
<fieldName>Distribution requirements</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Whether the user, when distributing the dataset, if allowed, must follow certain requirements</note> <vocabulary> </vocabulary> <validator></validator> |
Territory of use |
<fieldName>Territory of use</fieldName> <mandatory>true</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>In what territory dataset may be used</note> <vocabulary> <vocabularyField>World Wide</vocabularyField> <vocabularyField>Europe</vocabularyField> <vocabularyField>Albania</vocabularyField> <vocabularyField>Andorra</vocabularyField> <vocabularyField>Austria</vocabularyField> <vocabularyField>Belarus</vocabularyField> <vocabularyField>Belgium</vocabularyField> <vocabularyField>Bosnia and Herzegovina</vocabularyField> <vocabularyField>Bulgaria</vocabularyField> <vocabularyField>Croatia</vocabularyField> <vocabularyField>Cyprus</vocabularyField> <vocabularyField>Czech Republic</vocabularyField> <vocabularyField>Denmark</vocabularyField> <vocabularyField>Estonia</vocabularyField> <vocabularyField>Faroe Is.</vocabularyField> <vocabularyField>Finland</vocabularyField> <vocabularyField>France</vocabularyField> <vocabularyField>Germany</vocabularyField> <vocabularyField>Gibraltar</vocabularyField> <vocabularyField>Greece</vocabularyField> <vocabularyField>Guernsey</vocabularyField> <vocabularyField>Hungary</vocabularyField> <vocabularyField>Iceland</vocabularyField> <vocabularyField>Ireland</vocabularyField> <vocabularyField>Italy</vocabularyField> <vocabularyField>Latvia</vocabularyField> <vocabularyField>Liechtenstein</vocabularyField> <vocabularyField>Lithuania</vocabularyField> <vocabularyField>Luxembourg</vocabularyField> <vocabularyField>Macedonia</vocabularyField> <vocabularyField>Malta</vocabularyField> <vocabularyField>Moldova</vocabularyField> <vocabularyField>Monaco</vocabularyField> <vocabularyField>Montenegro</vocabularyField> <vocabularyField>Netherlands</vocabularyField> <vocabularyField>Norway</vocabularyField> <vocabularyField>Poland</vocabularyField> <vocabularyField>Portugal</vocabularyField> <vocabularyField>Romania</vocabularyField> <vocabularyField>San Marino</vocabularyField> <vocabularyField>Serbia</vocabularyField> <vocabularyField>Slovakia</vocabularyField> <vocabularyField>Slovenia</vocabularyField> <vocabularyField>Spain</vocabularyField> <vocabularyField>Sweden</vocabularyField> <vocabularyField>Switzerland</vocabularyField> <vocabularyField>United Kingdom</vocabularyField> <vocabularyField>Ukraine</vocabularyField> <vocabularyField>Vatican City</vocabularyField> </vocabulary> <validator></validator> |
License term |
<fieldName>License term</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Period of time during which the dataset may be used. Use ISO 8601 Date Format: YYYY-MM-DD[ HH:MM] Ex. 2016-07-31 or 2015-05-10 12:00</note> <vocabulary></vocabulary> <validator> <regularExpression>^(\d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01]))+([ ]+(\d{2}(:?\d{2})?)?)?$</regularExpression> </validator> |
Requirement of non-disclosure
(confidentiality mark) |
<fieldName>Requirement of non-disclosure (confidentiality mark)</fieldName> <mandatory>false</mandatory> <isBoolean>false</isBoolean> <defaulValue></defaulValue> <note>Requirement of non-disclosure (confidentiality mark). Whether the dataset bears confidentiality mark/may be used and shared subject to the obligation of non-disclosure</note> <vocabulary> </vocabulary> <validator></validator> |
gCube Data Catalogue: Ckan Connector
gCube Data Catalogue: Geo Harvesting
This extension contains plugins like ckanext-geonetwork (and others) which add geospatial capabilities to CKAN.
Several harvesters to import geospatial metadata (like ISO 19139 format) into CKAN from other sources have been created in gCube Data Catalogue. In particular all metadata created into gCube Geonetwork (GeoNetwork is the catalog application to manage spatially referenced resources generated into D4Science Infrastructure) are harvested through the 'Geoentwork Resolver' a "middle tier" able to:
- use the Geonetwork Manager in order to harvest private metadata (via authentication) stored in gCube Geonetwork on CKAN Data Catalogue (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE to harvest private metadata generated from scope /gcube/devsec/devVRE);
- create a CKAN Harvester that skip all public metadata via configuration during scope harvesting (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23filterpublicids to filter public ids during harvesting of /gcube/devsec/devVRE);
- create a CKAN Harvester to harvest only public metadata (saved on Geonetwork) avoiding the Geonetwork authentication via configuration (ex. http://data-d.d4science.org/geonetwork/gcube_devsec_devVRE%23noauthentication).
Mapping (among fields) from an ISO19139 Metadata to Ckan Dataset via ckanext-geonetwork is showed in the following table:
ISO19139 | Ckan Dataset |
---|---|
Title | Title |
Description | Description |
Digital Transfer Option | Data and Resource |
CI_OnlineResource | |
gmd:url | URL |
gmd:name | Name |
gmd:description | Description |
Descriptive Keywords | |
gmd:keyword | Tag |
Additional Info | |
bbox, metadata language, age,
reference system, etc. |
key/value |
gCube Data Catalogue: Geo Datasets
In order to make a dataset queryable by location (geospatial dataset), a special extra must be defined, with its key named ‘spatial’. The value must be a valid GeoJSON geometry, for example:
{ "type":"Polygon", "coordinates":[[[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]] }
[Note: the polygon must be closed]
or
{ "type": "Point", "coordinates": [-3.145,53.078] }
GeoJSON Format Specification are available here: http://geojson.org/geojson-spec.html Datasets with spatial values are automatically geo-indexed, for example so that they can be searched using spatial filters.
GeoSpatial search for datasets: via API or Search Widget
Once your datasets are geo-indexed, you can perform spatial queries by bounding box (coordinates format is [LONG, LAT]), via the following API call:
/api/2/search/dataset/geo?bbox={minx,miny,maxx,maxy}[&crs={srid}]
If the bounding box coordinates are not in the same projection as the one defined in the database, a CRS must be provided, in one of the following forms:
urn:ogc:def:crs:EPSG::4326 EPSG:4326 4326
Otherwise default bounding box is 4326. CKAN Wiki page for Legacy API
Moreover, you can perform spatial queries using an integrated map widget available on CKAN, which allows filtering results by an area of interest. You can try it on D4Science Data Catalogue
CKAN Wiki page for Spatial Search Widget