...
Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Core Metadata Schema Specification
Editors
...
Bruna dos Santos Vieira
Dena Tahvidalri (until 29th Feb 2024)
Ana Konrad
Repository
...
Latest published version
...
https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core/PiecesShape
Purpose of this document
This document outlines the Plateau 1 Core Metadata Schema, detailing the classes and entities involved and offering implementation guidance (usage notes) for developers at regional nodes. It specifically addresses the schema's design and application but excludes discussion on the national catalog, its onboarding process, and future schema expansions. Additional information and versions are available on GitHub.https://github.com/Health-RI/health-ri-metadata/.
Intended Audience
Technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core schema.
Notes on Alignment
To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe.
Scope
...
Terminology
According to DCAT-AP:
An Application Profile defines the mandatory, recommended, and optional components for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.
A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses.
A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.
Introduction
Scope
To make it easier to share, find and reuse data, the Health-RI nodes decided to list resources in a national directory that can be accessed internationally. They all agreed on what basic information should be included, and that the catalog should be interoperable with other EU portals, which led to the creation of the Core Metadata Schema.
This schema describes the minimum amount of information that should be used to describe resources across Health-RI nodes through the national directory, which is in line with what Plateau 1 offers. The schema can be changed or extended to meet the needs of different areas, and new versions will be released in the future. Apart from this metadata documentation, users can look for the onboarding documents with details on how to implement and connect, and the Plateau 1 documents which explain what each feature can do.
...
Diagram
An overview of the Metadata schema core is presented in the UML diagram depicted below (Fig 1). The UML showcases the primary classes (entities), excluding the detailed definitions such as rdfs:label rdfs:comment. Each block denotes a class and comprises a list of its attributes (properties). If a class is connected to another class by a closed arrow, this signifies indicating that it inherits all properties from the other class. For example, dcat:DatasetSeries
inherits from dcat:Dataset
which inherits from dcat:Resource
. The other arrows, represent relations and contain the type of relation, such as dcat:Dataset
connects to a dcat:DatasetSeries
via the predicate dcat:inSeries
, and include the cardinality, such as dcat:Dataset
can be connected via dcat:inSeries
to zero or more dcat:DatasetSeries
.
...
...
Mandatory and Recommended
...
Following the DCAT-AP specification, we distinguish between recommended and mandatory components (classes and properties). The subsequent section clarifies these components under ‘mandatory’ and ‘recommended’ headings. A third category named 'Optionalcategorize components into 'mandatory' and 'recommended' classes and properties. A potential third category, 'Optional,' may be introduced in the future.
In the context of data exchange, the following definitions apply:
Mandatory class: a receiver of data MUST be able to process Class: Senders MUST provide information about instances of the class; a sender of data MUST provide Receivers MUST process information about instances of the class.
Recommended class: a sender of data Class: Senders SHOULD provide information about instances of the class ; a sender of data MUST provide if available; Receivers MUST process information about instances of the class if such information is available; a receiver of data MUST be able to process information about instances of the class. Optional class: a receiver MUST be able to process information about instances of the class; a sender .
Optional Class: Senders MAY provide the information but is are not obliged to do so; Receivers MUST process information about instances of the class.
Mandatory property: a receiver MUST be able to process Senders MUST provide the information for that property; a sender Receivers MUST provide process the information for that property.
Recommended property: a receiver MUST be able to process the information for that property; a sender SHOULD provide Senders SHOULD provide the information if available; Receivers MUST process the information for that property if it is available.
Optional property: a receiver MUST be able to process the information for that property; a sender MAY provide the information for that property but is not obliged to do so.
Terminology
According to DCAT-AP:
An Application Profile defines the mandatory, recommended, and optional components for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.
A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses.
A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.
Prefixes
...
Prefix
...
Namespace IRI
...
Source
...
Prefixes
Prefix | Namespace IRI | Source | |
---|---|---|---|
|
| ||
|
| [DCT] | |
|
| [FOAF] | |
|
|
| |
|
| [DCT] | |
|
| [FOAF] | |
| |||
|
| ||
|
| ||
|
| ||
|
| [OWL-TIME] | |
|
|
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
title | A name given to the distribution. |
|
| the name of the dataset in combination with the format of the distribution can be used | 1..* |
access URL | A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint. |
|
| This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset. | 1..* |
media type | The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES]. |
|
| This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise | 1..* |
description | A unique identifier of the resource being described or catalog. |
|
| NA | 1..* |
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
access service | A data service that gives access to the distribution of the dataset |
|
|
| 0..* |
download URL | The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution's |
|
| NA | 0..* |
Agent
An entity that is associated with catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.
...
All things described by RDF are called resources, and they are instances of the class dcat:Resource
. This is the class of everything. All other classes are subclasses of this class. To read more, go to https://www.w3.org/TR/rdf12-schema/#ch_resource
Feedback and model extension - Git Issues
Feedback
If you wish to extend the model, such as with Resource, and/or create a new concept, please open an issue in Health-RI’s GitHub repository https://github.com/Health-RI/health-ri-metadata/tree/master and provide a clear explanation for the extension. Assign the issue to either ‘brunasv’ or ‘xiaofengleo’, and we will work with you to implement the addition in the next release.
...
When creating custom classes, it is essential to provide detailed metadata for each type of resource. This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.
Notes on Alignment
To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe.