STATUS: IN DEVELOPMENT
Details about this document
Editors
Bruna dos Santos Vieira
Dena Tahvidalri (until 29th Feb 2024)
Ana Konrad
Repository
Latest published version
https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core/PiecesShape
Purpose of this document
This document outlines the Plateau 1 Core Metadata Schema, detailing the classes and entities involved and offering implementation guidance (usage notes) for developers at regional nodes. It specifically addresses the schema's design and application but excludes discussion on the national catalog and its onboarding process. Additional information and versions are available on GitHub.https://github.com/Health-RI/health-ri-metadata/.
Intended Audience
Technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core schema.
Introduction
Scope
To make it easier to share, find and reuse data, the Health-RI nodes decided to list resources in a national directory that can be accessed internationally. They all agreed on what basic information should be included, and that the catalog should be interoperable with other EU portals, which led to the creation of the Core Metadata Schema.
This schema describes the minimum amount of information that should be used to describe resources across Health-RI nodes through the national directory, which is in line with what Plateau 1 offers. The schema can be changed or extended to meet the needs of different areas, and new versions will be released in the future. Apart from this metadata documentation, users can look for the onboarding documents with details on how to implement and connect, and the Plateau 1 documents which explain what each feature can do.
Mandatory and Recommended
Following the DCAT-AP specification, we categorize components into 'mandatory' and 'recommended' classes and properties. A potential third category, 'Optional,' may be introduced in the future.
In the context of data exchange:
Mandatory Class: Senders MUST provide information about instances of the class; Receivers MUST process information about instances of the class.
Recommended Class: Senders SHOULD provide information about instances of the class if available; Receivers MUST process information about instances of the class.
Optional Class: Senders MAY provide the information but are not obliged to do so; Receivers MUST process information about instances of the class.
Mandatory property: Senders MUST provide the information for that property; Receivers MUST process the information for that property.
Recommended property: Senders SHOULD provide the information if available; Receivers MUST process the information for that property.
Optional property: Senders MAY provide the information but are not obliged to do so; Receivers MUST process the information for that property.
Terminology
According to DCAT-AP:
An Application Profile defines the mandatory, recommended, and optional components for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.
A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses. A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.
Used Prefixes
Prefix | Namespace IRI | Source |
---|---|---|
|
| |
|
| [DCT] |
|
| [FOAF] |
|
| |
|
| |
|
| |
|
| |
|
| [OWL-TIME] |
|
|
Overview and Diagram
An overview of the Metadata schema core is presented in the UML diagram depicted below (Fig 1). The UML showcases the primary classes (entities), excluding the detailed definitions such as rdfs:label rdfs:comment. Each block denotes a class and comprises a list of its attributes (properties). If a class is connected to another class by a closed arrow, indicating that it inherits all properties from the other class. For example, dcat:DatasetSeries
inherits from dcat:Dataset
which inherits from dcat:Resource
. The other arrows, represent relations and contain the type of relation, such as dcat:Dataset
connects to a dcat:DatasetSeries
via the predicate dcat:inSeries
, and include the cardinality, such as dcat:Dataset
can be connected via dcat:inSeries
to zero or more dcat:DatasetSeries
.
Main Classes
Mandatory Classes
Class name | Definition | Usage Note | URI |
---|---|---|---|
Dataset | A resource type. | Used to describe one or more datasets. This describes details about the dataset(s). However, a single dataset can have different ways in which they are made available to potential users. How the data in a dataset can be accessed is defined in the Distribution. |
|
Catalog | A catalog that is listed in the National catalog. | Used to describe a bundle of datasets, data services, biobanks, patient registries, or guidelines together under a single title. |
|
Agent | An entity that is associated with catalog and/or Datasets. | If the Agent is an organisation, the use of the Organization Ontology is recommended. |
|
Resource | Resource published or curated by a single agent. | This is an abstract class, we do not use this class, instead we use specifications of it (e.g. Dataset). This is mainly for a high level grouping and the reuse of properties. |
|
Recommended Classes
Class name | Definition | Usage Note | URI |
---|---|---|---|
Distribution | An available distribution of the dataset. | Used to describe the different ways that a single dataset can be made available in. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage) |
|
Dataset Series | A resource type. Dataset series are defined in [ISO-19115] as a collection of datasets […] sharing common characteristics. However, their use is not limited to geospatial data, although in other domains they can be named differently (e.g., time series, data slices) and defined more or less strictly (see, e.g., the notion of "dataset slice" in VOCAB-DATA-CUBE). | With Dataset Series we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset. |
|
Data Service | A Resource type. | The kind of service can be indicated using the DRAFT EXAMPLE: |
|
Project | A project (a collective endeavour of some kind). | Used to describe a project that is connected to one or more datasets. A resource type |
|
Abstract Class
Resource is a generic concept from the DCAT vocabulary, that is rarely used directly, but indirectly through its extensions. We recommend avoiding using dcat:Resource
directly for your document and requesting a model extension or update, in case the type/class you need is not in this schema.
Class name | Definition | Usage Note | URI |
---|---|---|---|
Resource |
| This class is for grouping and class hierarchy relation purposes. |
|
Main Properties per Class
Catalog
A curated collection of metadata about resources. A web-based data catalog is typically represented as a single instance of this class.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
title | A name given to the resource. |
|
| The name of the catalog. This is a required field and needs to be unique. | 1..* |
description | A free-text account of the record. |
|
| A brief description of the catalog. It can consist of multiple strings. For example, this catalog describes breast cancer imaging datasets. | 1..* |
publisher | The entity responsible for making the resource available. |
|
| The organisation or a person that has published the catalog | 1..* |
dataset | relates every catalog to its containing datasets/ |
|
| The connection to the one or more datasets that this catalog describes. | 1..* |
Recommended Properties
No recommended properties are identified for this release.
Dataset
A collection of data, published or curated by a single agent, and available for access or download in one or more representations.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
Relevant contact information for the catalog resource. |
|
| Contact information that can be used, for example, for sending requests to further information or access to the Dataset. | 1..* | |
The entity responsible for producing the resource. |
|
| An agent (person or organisation) responsible for producing the dataset. | 1..* | |
A free-text account of the record |
|
| A free-text description of the Dataset. This property can be repeated for parallel language versions of the description. | 1..* | |
Issued | Date of formal issuance (e.g., publication) of the resource. |
| NA | 1..1 | |
A unique identifier of the resource being described or catalogd. |
|
| The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the catalog. | 1..1 | |
modified | Most recent date on which the catalog entry was changed, updated or modified. |
|
| The most recent date on which the Dataset was changed or modified. | 1..1 |
The entity responsible for making the resource available. |
|
| An agent (organisation or person) responsible for making the Dataset available. | 1..* | |
A main category of the resource. A resource can have multiple themes. |
|
| It consists of 1 or more IRIs (links) separated by commas. When set, it specifies relevant ontology concepts that classify the dataset. Typically, these can be looked up using the Ontology Lookup Service (OLS) or Bioportal. | 1..* | |
A name given to the record. |
|
| A name given to the Dataset. This property can be repeated for parallel language versions of the name. | 1..* | |
The nature or genre of the resource. |
|
| A type of the Dataset. A recommended controlled vocabulary data-type is foreseen. | 1..* | |
license | A legal document under which the resource is made available. |
|
| This should contain a URL that provides details regarding the license that is applicable to this dataset. | 1..1 |
relation | defines a relation |
| foaf:Project | 1..* |
Recommended Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
distribution | An available distribution of the dataset. |
|
| Use this property to point to the distribution of this dataset when a distribution is available. | 0..* |
project | connect dataset to the corresponding projects |
|
| Use this property to point to the related project of this dataset when a project is available. | 0..* |
has version | This resource has a more specific, versioned resource [PAV]. |
|
| This property refers to a related Dataset that is a version, edition, or adaptation of the described Dataset. | 0..* |
in series | A dataset series of which the dataset is part. |
|
| NA | 0..* |
Data Service
A collection of operations that provides access to one or more datasets or data processing functions.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
end point URL | The root location or primary endpoint of the service (a Web-resolvable IRI). |
|
Or
| NA | 1..* |
title | A name given to the distribution. |
|
| NA | 1..* |
Recommended Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
end point description | A description of the services available via the end-points, including their operations, parameters etc. |
|
| An endpoint description may be expressed in a machine-readable form, such as an OpenAPI (Swagger) description [OpenAPI], an OGC | 0..* |
serves dataset | A collection of data that this data service can distribute. |
|
| NA | 0..* |
Distribution
An available distribution of the dataset.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
title | A name given to the distribution. |
|
| the name of the dataset in combination with the format of the distribution can be used | 1..* |
access URL | A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint. |
|
| This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset. | 1..* |
media type | The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES]. |
|
| This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise | 1..* |
description | A unique identifier of the resource being described or catalog. |
|
| NA | 1..* |
Recommended Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
access service | A data service that gives access to the distribution of the dataset |
|
|
| 0..* |
download URL | The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution's |
|
| NA | 0..* |
Agent
An entity that is associated with catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
name | A name for some thing. |
|
| This property contains a name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages) | 1..* |
identifier | A unique identifier of the resource being described or catalog. |
|
| 1..1 |
Recommended Properties
No recommended properties are identified for this release.
Project
A project (a collective endeavour of some kind).
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
Description | description of the project |
|
| NA | 1..* |
identifier | A unique identifier of the resource being described or catalog. |
|
| NA | 1.1 |
title | A name given to the resource. |
|
| NA | 1..* |
funded by | An organization funding a project or person. |
|
| NA | 1..* |
relation | link to the project datasets |
|
| NA | 1..* |
Recommended Properties
No recommended properties are identified for this release.
Resource
All things described by RDF are called resources, and they are instances of the class dcat:Resource
. This is the class of everything. All other classes are subclasses of this class. To read more, go to https://www.w3.org/TR/rdf12-schema/#ch_resource
Feedback, Support and Implementation
Feedback - Git Issues
If you wish to extend the model, such as with Resource, and/or create a new concept, please open an issue in Health-RI’s GitHub repository https://github.com/Health-RI/health-ri-metadata/tree/master and provide a clear explanation for the extension. Assign the issue to either ‘brunasv’ or ‘xiaofengleo’, and we will work with you to implement the addition in the next release.
Model extension
Within DCAT and DCAT-AP, the term "resource" generally encompasses all objects that can be described using RDF. However, there are specific categories and attributes used to indicate the different types of resources:
dcat:Dataset
is a type ofdcat:Resource
representing a collection of datadcat:Distribution
is a type ofdcat:Resourcee
representing an available form or representation of a dataset.dcat:Catalog
is a type ofdcat:Resource
representing a collection of datasets.dcat:DataService
, introduced in DCAT version 2, is a type of Resource representing a service for accessing data.foaf:Project
is a type ofdcat:Resource
representing project-level information
In DCAT and DCAT-AP, the vocabulary is focused on datasets. Nonetheless, users may need to portray a variety of resources specific to certain domains, like biobanks or patient registries. In such cases, we propose potential scenarios for modifying or augmenting DCAT to accurately depict your resource type:
Use
dcat:Resource
directly: If the asset you are dealing with is not in line with thedcat:Dataset
definition, you can use the broader termdcat:Resource
. This term allows you to represent almost any type of asset. However, this approach may not be completely clear for users who are trying to understand the essence of the asset. We can de define the asset type further with specific vocabularies over time.Expand with Personalised Classes: If there is a need to represent specific resources, such as biobanks or patient registries, it may be beneficial to supplement the foundational DCAT vocabulary with custom classes. For example:
:Collection a rdfs:Class ;
rdfs:subClassOf dcat:Resource .
and
:PatientRegistry a rdfs:Class ;
rdfs:subClassOf dcat:Dataset .
When creating custom classes, it is essential to provide detailed metadata for each type of resource. This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.
Notes on Alignment
To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe.
Implementation
For shacl files to be used in, for example, a FAIR data point, please see Latest published version
https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core/PiecesShape