Difference between revisions of "Darwin Core Terms"
(→Example: Sarda from OBIS) |
(→Example: Sarda from OBIS) |
||
Line 34: | Line 34: | ||
* Add XML descriptor files to inform others how your files are organized; | * Add XML descriptor files to inform others how your files are organized; | ||
− | * Zip the folder into a single archive and you | + | * Zip the folder into a single archive and you are done! |
===Files=== | ===Files=== |
Revision as of 16:25, 11 September 2013
This quick guide provides a list of all current Darwin Core terms we are using in Species Product Discovery Service.
Darwin Core Archive is used to export taxonomic (species) data, while Simple Darwin Core is used to export species occurrences.
Contents
Darwin Core Archive
DarwinCore Archive (DwC-A) is a Biodiversity informatics data standard that makes use of the DarwinCore terms to produce a single, self contained dataset for taxonomic (species) data. The GBIF GNA format consists of a set of files where one (or more) files represents the 'core' taxonomic data where a single row represents a single taxon reference. The DarwinCore Taxon class provides the majority of concepts supported in the format that enable taxonomic and nomenclatural semantics and syntax (classification, taxonomic and nomenclatural synonymy, status, etc.) to be expressed.
Other files represent "extensions" to this core table and allow additional data elements to be linked to a taxon in the core table with a many to one relationship. The overall topology of one or more of these extensions to the core table is referred to as a "star schema" and provides a compromise between an overly simple flat-file representation of data and more complex multi-related files. In addition to these files, an additional descriptor file serves as a key to the other files. Collectively, these files can be further zipped into a single compressed archive file for portability. This compressed file is known as a DarwinCore Archive (DwCA) file.
Example: Sarda from OBIS
The example represents the sarda species checklist from OBIS.
The process to create a Darwin Core Archive is simple:
- Identify the standard terms and extensions required to map the biodiversity data you wish to share in your archive;
- Export your data as a set of one or more text (CSV) files;
taxonID parentNameUsageID scientificName scientificNameAuthorship kingdom phylum class order taxonRank __________________________________________________________________________________________________________________________________________________________________________________ OBIS:506000 OBIS:758818 Sarda Cuvier, 1829 Animalia Chordata Actinopterygii Perciformes genus OBIS:758818 OBIS:755264 Scombridae Animalia Chordata Actinopterygii Perciformes family OBIS:755264 OBIS:737387 Perciformes Animalia Chordata Actinopterygii order OBIS:737387 OBIS:755917 Actinopterygii Animalia Chordata class .........
- Add XML descriptor files to inform others how your files are organized;
- Zip the folder into a single archive and you are done!
Files
A DarwinCore Archive file generated by SPD Service (in our case representing Darwin Core Taxa) contains the following files:
- a metadata file (eml.xml) that describes the data resource;
- a metafile (meta.xml) file that describes the content and relationship of the text data files;
- a core data file in CSV (taxa.txt) consisting of a standard set of DarwinCore terms;
- a vernacular names extension file (VernacularName.txt) that supports the description of common name properties that might be related to a species described in the core data file.
Metafile (meta.xml)
The DarwinCore Archive format relies on a special file - an XML descriptor file, called the "metafile" (typically named meta.xml). The metafile is used as a map to describe the core taxon file and any extensions that collectively form the specific data profile that will be produced by the user.
<?xml version='1.0' encoding='utf-8'?> <archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml"> <core encoding="UTF-8" linesTerminatedBy="\n" fieldsTerminatedBy="\t" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon"> <files> <location>taxa.txt</location> </files> <id index="0"/> <field index="0" term="http://rs.tdwg.org/dwc/terms/taxonID"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/acceptedNameUsageID"/> <field index="2" term="http://rs.tdwg.org/dwc/terms/parentNameUsageID"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/scientificName"/> <field index="4" term="http://rs.tdwg.org/dwc/terms/scientificNameAuthorship"/> <field index="5" term="http://rs.tdwg.org/dwc/terms/nameAccordingTo"/> <field index="6" term="http://rs.tdwg.org/dwc/terms/kingdom"/> <field index="7" term="http://rs.tdwg.org/dwc/terms/phylum"/> <field index="8" term="http://rs.tdwg.org/dwc/terms/class"/> <field index="9" term="http://rs.tdwg.org/dwc/terms/order"/> <field index="10" term="http://rs.tdwg.org/dwc/terms/family"/> <field index="11" term="http://rs.tdwg.org/dwc/terms/genus"/> <field index="12" term="http://rs.tdwg.org/dwc/terms/subgenus"/> <field index="13" term="http://rs.tdwg.org/dwc/terms/specificEpithet"/> <field index="14" term="http://rs.tdwg.org/dwc/terms/infraspecificEpithet"/> <field index="15" term="http://rs.tdwg.org/dwc/terms/verbatimTaxonRank"/> <field index="16" term="http://rs.tdwg.org/dwc/terms/taxonRank"/> <field index="17" term="http://rs.tdwg.org/dwc/terms/taxonomicStatus"/> <field index="18" term="http://purl.org/dc/terms/modified"/> <field index="19" term="http://purl.org/dc/terms/bibliographicCitation"/> <field index="20" term="http://rs.tdwg.org/dwc/terms/taxonRemarks"/> <field index="21" term="http://rs.tdwg.org/dwc/terms/scientificNameID"/> </core> <extension encoding="UTF-8" linesTerminatedBy="\n" fieldsTerminatedBy="\t" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.gbif.org/terms/1.0/VernacularName"> <files> <location>VernacularName.txt</location> </files> <coreid index="0"/> <field index="1" term="http://rs.tdwg.org/dwc/terms/vernacularName"/> <field index="2" term="http://purl.org/dc/terms/language"/> <field index="3" term="http://rs.tdwg.org/dwc/terms/locality"/> </extension> </archive>
Core Data File (taxa.txt)
A required core data file consisting of a standard set of DarwinCore terms.
A data file is formatted as fielded text, where data records are expressed as rows of text, and data fields (column) are separated with a standard delimiter (a tab).
In a DarwinCore data file the first row of the file contains the names of the DarwinCore terms represented in the succeeding rows of the data.
Repository URL: http://rs.gbif.org/core/dwc_taxon.xml
Field | Description | DwC term |
---|---|---|
taxonID | A ‘taxonID’ value may be any string, it is not required to be numeric. An accepted name should have a unique ‘taxonID’ value. A synonym (or similar name linked to a taxon) should ideally have an identifier in the ‘taxonID’ field.
Data Type: string |
http://rs.tdwg.org/dwc/terms/taxonID |
acceptedNameUsageID | The field ‘acceptedNameUsageID’ should be used to link a synonym record to its corresponding accepted name (which will have a matching ‘taxonID’ value).
An accepted name should have an empty ‘acceptedNameUsageID’ field. Data Type: string |
http://rs.tdwg.org/dwc/terms/acceptedNameUsageID |
parentNameUsageID | The field ‘parentNameUsageID’ of the accepted name record for a taxon is used to refer to the ‘taxonID’ value of the parent taxon at the next higher taxonomic rank included in the checklist.
If there is no parent included in the checklist, because the “top of the tree” has been reached, then this field should be empty to indicate this. Data Type: string |
http://rs.tdwg.org/dwc/terms/parentNameUsageID |
scientificName | The scientific name of taxon with or without authorship information depending on the format of the source database.
Examples: "Coleoptera" , "Vespertilionidae”, "Manis" , "Ctenomys sociabilis", "Ambystoma tigrinum diaboli", "Quercus agrifolia var.oxyadenia (Torr.)" Data Type: string |
http://rs.tdwg.org/dwc/terms/scientificName |
scientificNameAuthorship | If the authority is known and can be separated from the rest of the scientific name, the authority string should also be placed in the ‘scientificNameAuthorship’ field.
If authorship is included in the scientificName field, this field is optional. Example: "(Torr.) J.T. Howell", "(Martinovsk ) Tzvelev", "(Linnaeus 1768)" Data Type: string |
http://rs.tdwg.org/dwc/terms/scientificNameAuthorship |
nameAccordingTo | A citation representing the concept or sense in which the name is used.
Data Type: string |
http://rs.tdwg.org/dwc/terms/nameAccordingTo |
kingdom | The full scientific name of the kingdom in which the taxon is classified.
Example: "Animalia", "Plantae" Data Type: string |
http://rs.tdwg.org/dwc/terms/kingdom |
phylum | The full scientific name of the phylum in which the taxon is classified.
Example: "Chordata" (phylum), "Bryophyta" (division) Data Type: string |
http://rs.tdwg.org/dwc/terms/phylum |
class | The full scientific name of the class in which the taxon is classified.
Example: "Mammalia", "Hepaticopsida" Data Type: string |
http://rs.tdwg.org/dwc/terms/class |
order | The full scientific name of the order in which the taxon is classified.
Example: "Carnivora", "Monocleales" Data Type: string |
http://rs.tdwg.org/dwc/terms/order |
family | The full scientific name of the family in which the taxon is classified.
Example: "Felidae", "Monocleaceae" Data Type: string |
http://rs.tdwg.org/dwc/terms/family |
genus | The full scientific name of the genus in which the taxon is classified.
Example: "Puma", "Monoclea" Data Type: string |
http://rs.tdwg.org/dwc/terms/genus |
subgenus | The full scientific name of the subgenus in which the taxon is classified. Values should include the genus to avoid homonym confusion.
Example: Puma (Puma); Loligo (Amerigo); Hieracium subgen. Pilosella Data Type: string |
http://rs.tdwg.org/dwc/terms/subgenus |
specificEpithet | 2nd word in a scientific name (species), es. Acer saccharum, saccharum is the specificEpithet.
Example: scientificName: Carex viridula subsp. brachyrrhyncha var. elatior (Schltdl.) Crins specificEpithet: viridula Data Type: string |
http://rs.tdwg.org/dwc/terms/specificEpithet |
infraspecificEpithet | Terminal word in a scientific name.
Example: scientificName: Carex viridula subsp. brachyrrhyncha var. elatior (Schltdl.) Crins infraspecificEpithet: elatior Data Type: string |
http://rs.tdwg.org/dwc/terms/infraspecificEpithet |
verbatimTaxonRank | The taxonomic rank of the most specific name in the scientificName.
Example: scientificName: Carex viridula subsp. brachyrrhyncha var. elatior (Schltdl.) Crins verbatimTaxonRank: var. Data Type: string |
http://rs.tdwg.org/dwc/terms/verbatimTaxonRank |
taxonRank | The taxonomic rank of the most specific name in the scientificName. Recommended best practice is to use a controlled vocabulary: http://rs.gbif.org/vocabulary/gbif/rank.xml.
Examples: "subspecies", "varietas", "forma", "species", "genus". Data Type: string |
http://rs.tdwg.org/dwc/terms/taxonRank |
taxonomicStatus | The status of the use of the scientificName as a label for a taxon. Controlled vocabulary:
"accepted", "invalid", "misapplied", "provisional", “synonym”, “valid” “unknown” has also been suggested, but often an empty value is expected to indicate an unknown value. Data Type: string |
http://rs.tdwg.org/dwc/terms/taxonomicStatus |
modified | The most recent date-time on which the resource was changed.
It is recommended this format: “YYYY-MM-DD”. Data Type: date |
http://purl.org/dc/terms/modified |
bibliographicCitation | Citation information specified by the data publisher.
Data Type: string |
http://purl.org/dc/terms/bibliographicCitation |
taxonRemarks | Comments or notes about the taxon or name.
Data Type: string |
http://rs.tdwg.org/dwc/terms/taxonRemarks |
scientificNameID | Exclusively used to reference an external and resolvable identifier that returns nomenclatural (not taxonomic) details of a name. Use taxonID to refer to taxa. Use to explicitly refer to an external nomenclatural record.
Example: “urn:lsid:ipni.org:names:37829-1:1.3” Data Type: string |
http://rs.tdwg.org/dwc/terms/scientificNameID |
Vernacular Names Extension File (VernacularName.txt)
Extension file are also simple text files that can visualised as a spreadsheet. They are tied to the core taxon file through a copy of the taxonID used in the core taxon file that is repeated once for each row in the extension file in a manner similar to foreign keys in a relational database. An extension file may include Darwin Core terms as well as terms defined through other means.
The use of extension files allows checklist information to be represented in a one-to-many relation ship between the core taxon file and the extension. For example, Vernacular Names Extension provides the means to share information related to common (vernacular) names linked to taxa in the core data file. Multiple vernacular names can be linked to the same taxon via the taxonID.
Repository: http://rs.gbif.org/extension/gbif/1.0/vernacular.xml
Field | Description | DwC term |
---|---|---|
taxonID | The first field in the data file should be the taxonID representing the taxon in the core data file to which this vernacular name points. This identifier provides the link between the core data record and the extension record.
Data Type: string |
'http://rs.tdwg.org/dwc/terms/taxonID |
vernacularName | A common or vernacular name.
Example: Andean Condor", "Condor Andino", "American Eagle", "Gönsegeier" Data Type: string |
http://rs.tdwg.org/dwc/terms/vernacularName |
language | ISO 639-1 language code used for the vernacular name value.
Example: “ES”, “Spanish”, “Español” Data Type: string |
http://purl.org/dc/terms/language |
locality | The specific description of the area from which the vernacular name usage originates. Vernacular names may have very specific regional contexts. A name used for a species in one area may refer to a different species in another.
Example: "Southeastern coastal New England from Buzzards Bay through Rhode Island" Data Type: string |
http://rs.tdwg.org/dwc/terms/locality |
Validating
You can test your archive for structural or data problems by checking it with the GBIF Darwin Core Archive Validator.
This tool will perform several tests:
- ensure the archive is valid by decompressing it and confirming the presence of a meta.xml descriptor file;
- validate the meta.xml file according to the DwC-A meta file XML Schema;
- verify that required fields are included;
- validate certain field values such as data types and licenses;
- perform some data type-specific validation such as verify text object contain descriptions and images contain accessURIs.
Simple Darwin Core
The Simple Darwin Core has minimal restrictions on which fields are required (none). By having no required field restriction, the Simple Darwin Core can be used to share any meaningful combination of fields - in our case, occurrences.
The following schema specifies the fields contained in a Simple Darwin Core file.
Field | Description | DwC term |
---|---|---|
occurrenceID | The ID is supposed to (globally) uniquely identify an occurrence record, whether it is a specimen-based occurrence, a one-time observation of a species at a location, or one of many occurrences of an individual who is being tracked, monitored, or recaptured. Making it globally unique is quite a trick, one for which we don't really have good solutions in place yet, but one which ontologists insist is essential.
Data Type: string |
http://rs.tdwg.org/dwc/terms/occurrenceID |
scientificNameAuthorship | The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
Example: "(Torr.) J.T. Howell", "(Martinovský) Tzvelev", "(Györfi, 1952)". Data Type: string |
http://rs.tdwg.org/dwc/terms/scientificNameAuthorship |
language | The language of the parent resource. Recommended best practice is to use a controlled vocabulary such as ISO 693.
Example: "eng" Data Type: string |
http://purl.org/dc/terms/language |
modified | The most recent date-time on which the resource was changed. Recommended format: "yyyy-MM-dd'T'HH:mm:ss"
Data Type: string |
http://purl.org/dc/terms/modified
|
basisOfRecord | The specific nature of the data record - a subtype of the dcterms:type. Recommended best practice is to use a controlled vocabulary such as the Darwin Core Type Vocabulary (http://rs.tdwg.org/dwc/terms/type-vocabulary/index.htm).
Examples: "PreservedSpecimen", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation". Data Type: string |
http://rs.tdwg.org/dwc/terms/basisOfRecord |
institutionCode | The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.
Examples: "MVZ", "FMNH", "AKN-CLO", "University of California Museum of Paleontology (UCMP)". Data Type: string |
http://rs.tdwg.org/dwc/terms/institutionCode
|
collectionCode | The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.
Examples: "Mammals", "Hildebrandt", "eBird". Data Type: string |
http://rs.tdwg.org/dwc/terms/collectionCode |
catalogNumber | An identifier (preferably unique) for the record within the data set or collection.
Examples: "2008.1334", "145732a", "145732". Data Type: string |
http://rs.tdwg.org/dwc/terms/catalogNumber
|
identified | A list of names of people, groups, or organizations responsible for recording the original Occurrence.
Example: "Oliver P. Pearson; Anita K. Pearson". Data Type: string |
http://rs.tdwg.org/dwc/terms/identifiedBy
|
scientificName | The scientific name of taxon with or without authorship information depending on the format of the source database.
Examples: "Coleoptera", "Vespertilionidae". Data Type: string |
http://rs.tdwg.org/dwc/terms/scientificName |
kingdom | The full scientific name of the kingdom in which the taxon is classified.
Example: "Animalia", "Plantae". Data Type: string |
http://rs.tdwg.org/dwc/terms/kingdom
|
family | The full scientific name of the family in which the taxon is classified.
Example: "Felidae", "Monocleaceae". Data Type: string |
http://rs.tdwg.org/dwc/terms/family
|
locality | The specific description of the place.
Example: "Bariloche, 25 km NNE via Ruta Nacional 40". Data Type: string |
http://rs.tdwg.org/dwc/terms/locality
|
eventDate | The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded.
Examples: "1963-03-08T14:07-0600". Data Type: date |
http://rs.tdwg.org/dwc/terms/eventDate
|
year | The four-digit year in which the Event occurred, according to the Common Era Calendar.
Example: "2008". Data Type: date |
http://rs.tdwg.org/dwc/terms/year
|
decimalLatitude | The geographic latitude.
Data Type: float |
http://rs.tdwg.org/dwc/terms/decimalLatitude |
decimalLongitude | The geographic longitude.
Data Type: float |
http://rs.tdwg.org/dwc/terms/decimalLongitude
|
coordinateUncertaintyInMeters | As close an approximation to the standard deviation of the coordinates expressed in meters.
Data Type: float |
http://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters |
maximumDepthInMeters | The greater depth of a range of depth below the local surface, in meters.
Data Type: float |
http://rs.tdwg.org/dwc/terms/maximumDepthInMeters |
minimumDepthInMeters | The lesser depth of a range of depth below the local surface, in meters.
Data Type: float |
http://rs.tdwg.org/dwc/terms/minimumDepthInMeters
|