Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel1
maxLevel2
outlinefalse
typelist
printablefalse

Core Metadata Schema Specification

Editors

...

  • Bruna dos Santos Vieira

  • Dena Tahvidalri (until 29th Feb 2024)

  • Ana Konrad

Repository

...

Latest published version

...

https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core/PiecesShape

Purpose of this document

This document outlines the Plateau 1 Core Metadata Schema, detailing the classes and entities involved and offering implementation guidance (usage notes) for developers at regional nodes. It specifically addresses the schema's design and application but excludes discussion on the national catalog, its onboarding process, and future schema expansions. Additional information and versions are available on GitHub.https://github.com/Health-RI/health-ri-metadata/.

Intended Audience

Technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core schema.

Notes on Alignment

To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe.

Scope

...

Terminology

According to DCAT-AP:

  • An Application Profile defines the mandatory, recommended, and optional components for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.

  • A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses.

  • A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.

Introduction

Scope

To make it easier to share, find and reuse data, the Health-RI nodes decided to list resources in a national directory that can be accessed internationally. They all agreed on what basic information should be included, and that the catalog should be interoperable with other EU portals, which led to the creation of the Core Metadata Schema.

This schema describes the minimum amount of information that should be used to describe resources across Health-RI nodes through the national directory, which is in line with what Plateau 1 offers. The schema can be changed or extended to meet the needs of different areas, and new versions will be released in the future. Apart from this metadata documentation, users can look for the onboarding documents with details on how to implement and connect, and the Plateau 1 documents which explain what each feature can do.

...

Diagram

An overview of the Metadata schema core is presented in the UML diagram depicted below (Fig 1). The UML showcases the primary classes (entities), excluding the detailed definitions such as rdfs:label rdfs:comment. Each block denotes a class and comprises a list of its attributes (properties). If a class is connected to another class by a closed arrow, this signifies indicating that it inherits all properties from the other class. For example, dcat:DatasetSeries inherits from dcat:Dataset which inherits from dcat:Resource. The other arrows, represent relations and contain the type of relation, such as dcat:Dataset connects to a dcat:DatasetSeries via the predicate dcat:inSeries, and include the cardinality, such as dcat:Dataset can be connected via dcat:inSeries to zero or more dcat:DatasetSeries.

...

...

Mandatory and Recommended

...

Following the DCAT-AP specification, we distinguish between recommended and mandatory components (classes and properties). The subsequent section clarifies these components under ‘mandatory’ and ‘recommended’ headings. A third category named 'Optionalcategorize components into 'mandatory' and 'recommended' classes and properties. A potential third category, 'Optional,' may be introduced in the future.

In the context of data exchange, the following definitions apply:

  • Mandatory class: a receiver of data MUST be able to process Class: Senders MUST provide information about instances of the class; a sender of data MUST provide Receivers MUST process information about instances of the class.

  • Recommended class: a sender of data Class: Senders SHOULD provide information about instances of the class ; a sender of data MUST provide if available; Receivers MUST process information about instances of the class if such information is available; a receiver of data MUST be able to process information about instances of the class. Optional class: a receiver MUST be able to process information about instances of the class; a sender .

  • Optional Class: Senders MAY provide the information but is are not obliged to do so; Receivers MUST process information about instances of the class.

  • Mandatory property: a receiver MUST be able to process Senders MUST provide the information for that property; a sender Receivers MUST provide process the information for that property.

  • Recommended property: a receiver MUST be able to process the information for that property; a sender SHOULD provide Senders SHOULD provide the information if available; Receivers MUST process the information for that property if it is available.

  • Optional property: a receiver MUST be able to process the information for that property; a sender MAY provide the information for that property but is not obliged to do so.

Terminology

According to DCAT-AP:

  • An Application Profile defines the mandatory, recommended, and optional components for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.

  • A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses.

  • A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.

Prefixes

...

Prefix

...

Namespace IRI

...

Source

...

Prefixes

Prefix

Namespace IRI

Source

dcat

http://www.w3.org/ns/dcat#

[VOCAB-DCAT]

dct

http://purl.org/dc/terms/

[DCT]

foaf

http://xmlns.com/foaf/0.1/

[FOAF]

owl

owl

http://www.w3.org/ns/dcat#

[VOCAB-DCAT]

dct

http://purl.org/dc/terms/

[DCT]

foaf

http://xmlns.com/foaf/0.1/

[FOAF]

http://www.w3.org/2002/07/2002/07/owl#

[OWL2-SYNTAX]

rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

[RDF-SYNTAX-GRAMMAR]

rdfs

http://www.w3.org/2000/01/rdf-schema#

[RDF-SCHEMA]

skos

http://www.w3.org/2004/02/skos/core#

[SKOS-REFERENCE]

time

http://www.w3.org/2006/time#

[OWL-TIME]

xsd

http://www.w3.org/2001/XMLSchema#

[XMLSCHEMA11-2]

...

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

title

A name given to the distribution.

dct:title

xsd:string

the name of the dataset in combination with the format of the distribution can be used

1..*

access URL

A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint.

dcat:accessURL

xsdrdfs:stringResource

This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset.

1..*

media type

The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES].

dcat:mediaType

xsd:string

This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise dcterms:format MAY be used with different values.

1..*

description

A unique identifier of the resource being described or catalog.

dct:description

xsd:string

NA

1..*

...

Property name

Definition

URI

rdfs:Range

Usage Note

Cardinality

access service

A data service that gives access to the distribution of the dataset

dcat:accessService

dcat:DataService

dcat:accessService SHOULD be used to link to a description of a dcat:DataService that can provide access to this distribution.

0..*

download URL

The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution's dcterms:format and/or dcat:mediaType

dcat:downloadURL

dcatrdfs:Resource

xsd:string

NA

0..*

Agent

An entity that is associated with catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.

...

All things described by RDF are called resources, and they are instances of the class dcat:Resource. This is the class of everything. All other classes are subclasses of this class. To read more, go to https://www.w3.org/TR/rdf12-schema/#ch_resource

Feedback and model extension - Git Issues

Feedback

If you wish to extend the model, such as with Resource, and/or create a new concept, please open an issue in Health-RI’s GitHub repository https://github.com/Health-RI/health-ri-metadata/tree/master and provide a clear explanation for the extension. Assign the issue to either ‘brunasv’ or ‘xiaofengleo’, and we will work with you to implement the addition in the next release.

...

When creating custom classes, it is essential to provide detailed metadata for each type of resource. This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.

Notes on Alignment

To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). With the help of metadata specialists, we mapped their classes and properties and decided to reuse DCAT and DCAT-AP for implementation. The Core metadata schema includes DCAT v3 and selected DCAT-AP mandatory classes, ensuring compatibility with international catalogs. DCAT-AP covers the identified requirements for exchanging information about datasets and services in Europe.