4A Metadata mapping

status: in development

1 Introduction
2 What is metadata?
3 Health-RI Core Metadata Schema
4 What are the elements of the Health-RI core metadata schema?
- 4.1 Overview of all core Health-RI classes and relations between classes
- 4.2 UML diagram
5 Next steps
6 Additional resources
7 Questions?

Introduction

Before you can add your resource’s metadata to the National Health Data Catalogue, you will need to know what metadata are, where your metadata are located and what metadata is needed for the Catalogue. Independent of how you will add your metadata to a FAIR Data Point (manually or automatically), you will need to map your metadata values to the Health-RI core metadata schema.

In this section, we describe the basics of metadata and explain how to map your metadata to the Health-RI core metadata schema.

The information on this page is meant for professionals with or without experience in metadata, metadata mapping or semantic modeling, who want to add their metadata to the National Health Data Catalogue. Additional background information is presented to give you the choice to learn at a “need to know” basis or dive deeper into the background of the metadata mapping process.

If you already understand the Health-RI core metadata schema and want to go to metadata mapping immediately, you can follow this tutorial: Mapping tutorial.

In the future, we hope to support you with tools that can automatically transform metadata entered in a template (for example CSV) into RDF, ready to be added to the FDP.

Fig. 1: General overview of the Implementation (step 4) of the onboarding. Note that some implementation steps might (FDP implementation, automation etc.) be done by institute/node if you are using their infrastructure.

What is metadata?

Metadata is data about data. It provides context for your data, such as a description of its content or purpose, the owner of the data, or the format of the data. In other words, metadata helps understanding and managing data effectively by providing additional information about it.

Specifically for the National Health Data Catalogue, users of the catalogue will be able to discover datasets and assess their usability based on the provided metadata. Therefore, as a data holder onboarding data, it is essential to provide detailed and complete metadata about your dataset(s). That way, you also adhere to the F2 of the FAIR principles: Data are described with rich metadata. If the metadata contains the right information, e.g. about the type of cancer that is relevant in a dataset, a data user will be able to find relevant and interesting datasets in the catalogue.

A metadata standard is a set of rules, guidelines and conventions that define which metadata should be described and how this metadata should be structured and formatted within a particular domain or context. Adhering to such standards ensures consistency, interoperability and effective management of metadata across different systems, organizations and disciplines.

To find out more on where you can find metadata for your resource go to the Metroline Step: Assess availability of your metadata .

Health-RI Core Metadata Schema

The National Health Data Catalogue currently uses a core metadata schema: a set of minimal elements for describing each resource (e.g. dataset) with common metadata. It defines the requirements to access and reuse information across Health-RI nodes via the National Catalogue.

The Health-RI core metadata schema is based on widely used metadata standards such as DCAT-AP, DCAT-AP NL and HealthDCAT-AP.

The first version (v1) of the Health-RI core metadata schema is based on DCAT-AP v3. Version 2 (v2) of the Health-RI core metadata schema also incorporates the (draft) HealthDCAT-AP and applies restrictions as defined in DCAT-AP-NL (Dutch DCAT-AP specification). You can find more information on the relation of the Health-RI core metadata schema to other application profiles in Relation of the Health-RI core metadata schema to other DCAT application profiles.

Where do I find detailed descriptions of the Health-RI core metadata schema?

For specific details on the Health-RI core metadata schema, please visit the Github specifications dedicated for data experts or data stewards: Currently, we are transferring to a new version of the Health-RI core metadata schema: v2, available on Github here. Specifications from the official v1 release are available here.

What are the elements of the Health-RI core metadata schema?

The Health-RI core metadata schema consists of classes, properties and relationships. Classes are the main entities describing the resource, such as “Dataset”. Each class has a number of properties (related metadata fields) that specify the class further and each property has specific attributes such as “range”.
These classes, properties and their attributes are visualized in a UML (Unified Modeling language) diagram. Below you find an example (Fig. 2) and more detailed explanations for these elements.

Fig. 2: Visual example of two classes, their respective properties and relationship between these classes.

Classes

The Health-RI core metadata schema is split into several classes. Classes are the main entities describing the data, which can be used to represent the overall structure/context of the metadata describing datasets. Each class is described using a URI (unifrom resource identifier) consisting of the vocabulary (e.g. dcat) and the class name in that vocabulary (e.g. Dataset).
For example, all instances of dcat:Dataset of an institute could be grouped under an instance of dcat:Catalog, which contains information about the institute (e.g. a Radboudumc catalog being the umbrella of all Radboudumc datasets), where individual instances of dcat:Dataset describe the individual datasets published by the institute.

We also discriminate between main classes, like dcat:Dataset and dcat:Catalog, and supporting classes, like foaf:Agent and vcard:Kind. The latter describe certain properties (e.g. contact details in the case of vcard:Kind) each with its own set of properties (see also Fig. 2 above).

At the moment four classes (dcat:Dataset, dcat:Catalog, vcard:Kind and foaf:Agent) are mandatory in the Health-RI metadata model. The other classes (such as dcat:DatasetSeries) are not strictly necessary to onboard data to the National Health Data Catalogue, but using them can be beneficial to provide meaningful context of a dataset. For example, datasets that are published yearly can be listed alone, but can be associated with each other via connecting them all to the same dcat:DatasetSeries to indicate the datasets are related.

For an overview of the classes in metadata core v2 and their relations, see the figure below (Fig. 3). Additionally, we provide some considerations and guidelines on mapping to the different classes here: Recommendations on mapping to classes in the v2 core metadata

Overview of all core Health-RI classes and relations between classes

Fig. 3: Overview of the classes of the Health-RI core metadata schema v2, including all properties that establish the relations between theses classes. See section below that describes all these connections/relations. Note that you will most likely not need or make use of all the classes.

How are the different classes related to each other? Expand the section below to find out!

Below we describe all possible relations between the main classes of the v2 Health-RI core metadata schema.
For each connection, we describe from which class, via which property it is connected to another class.

For example, dcat:Catalog → dcat:dataset → dcat:Dataset means: the dcat:Catalog class has property dcat:dataset with range dcat:Dataset. In other words, the catalog class points to the dataset class instance via the dcat:dataset property.
It is very likely you will not make use of all of these connections, but for the sake of completeness, we have described them all here above.

Main connections
- dcat:Catalog → dcat:dataset → dcat:Dataset
  Establishes the connection between a catalog and a dataset in that catalog.
- dcat:Dataset → dcat:distribution → dcat:Distribution
  Connection between dataset and its distribution.
In case your dataset is part of a series:
- dcat:Dataset → dcat:inSeries→ dcat:DatasetSeries
  Connection between dataset and a dataset series it belongs to. Different datasets from the same series will point to the same instance of dcat:DatasetSeries.
Dataset to another Dataset
- dcat:Dataset → dct:source → dcat:Dataset
  If a dataset is based on another dataset, this is used to reference to the source dataset.
- dcat:Dataset → dct:hasVersion→ dcat:Dataset
  Reference to another version of the same dataset.
Data Service to other classes
- dcat:Catalog → dcat:service → dcat:DataService
  Connection between a catalog and data service.
- dcat:DataService→ dcat:servesDataset → dcat:Dataset
  Reference of between a data service and the dataset it serves.
Catalog to another Catalog
- dcat:Catalog → dcat:catalog → dcat:Catalog
  Connection between related catalogs.
- dcat:Catalog → dct:hasPart → dcat:Catalog
  Establishing nested catalogs.
Special Distributions of a Dataset (both are introduced by HealthDCAT-AP).
- dcat:Dataset → healthdcatap:analytics → dcat:Distribution
  Relation to analytics distribution of a dataset. More information available here.
- dcat:Dataset → adms:sample → dcat:Distribution
  Relation to samle distribution of a dataset. More information available here.

Note that you will most likely NOT need or make use of all available classes in the Health-RI core metadata schema (v2). Some classes are not applicable to all cases, e.g. in a case where an institute wants to describe only the available datasets, they might only use the dcat:Catalog and dcat:Dataset classes.
More information and considerations/guidelines for different use cases are described here: Recommendations on mapping to classes in the v2 core metadata

Properties

Each class consists of a set of its own, related metadata fields, so called properties, that describe the entity (class) in more detail. For example, each dcat:Dataset contains the properties dct:title and dct:description, which are free text fields that provide a title and detailed description of the contents of the dataset. In another example, the class vcard:Kind (which is used to provide contact details of a resource) contains the property vcard:hasEmailto provide an email address in the metadata.
Each property has a number of attributes (i.e. requirement level, cardinality, range, property URI):

Requirement level: Each property has a requirement level, indicating whether it is mandatory, recommended or optional to fill this property in the respective class. Mandatory properties must always filled. Recommended properties should be filled if the information is available. Optional properties can be filled, but are not always available or applicable.
Cardinality: Each property has a cardinality, that further specifies the requirement level. Cardinalities are expressed with integers (e.g. 0..1). The first integer indicates how many times the property has to be filled at minimum, the second indicates the maximum. The most commonly occurring cardinalities are:
- 0..n (also written as 0..*): The property is not mandatory, but can be filled many times.
- 0..1: The property is not mandatory, but may only be filled once at most.
- 1..n (also written as 1..*): The property must be filled (is mandatory), and can be filled many times.
- 1..1: The property must be filled once (is mandatory), but only once.
Range/format: For each property, it is specified how it should be filled, specifying its range. This determines the format of the filled value per property, for example whether the property is to be filled with free text (rdfs:Literal), a date in a specific format (xsd:dateTime), or point to another class (for example, the range of dcat:service property in dcat:Catalog is dcat:DataService, establishing the connection between instances of the two classes via its IRI (Internationalized Resource Identifier)).
Controlled vocabularies: A number of properties have to be filled with values from so-called controlled vocabularies, a specific list of pre-defined values that can be linked to. For example, the property access rights in the Dataset class restricts the range to three specific values from a EU-controlled vocabulary for access rights. In the Health-RI model, we have added the relevant link to the controlled vocabulary to the description of the respective properties.
Properties connecting classes (relationships): Classes also contain a specific set of properties that connect one class to another (establishing a relationship between instances of different classes). For example, in the dcat:Catalog class of the Health-RI core, the dcat:dataset property establishes the connection between a catalogue and a dataset it contains.
Like other properties, these connecting properties have a requirement level (in our example, mandatory), cardinality (in our example, 1..n, meaning that each dcat:Catalog has to contain at least one dcat:Dataset), and a specified range (in this example, the property dcat:dataset has the range dcat:Dataset, indicating that this property in the dcat:Catalog class points to an instance of a dataset, via the IRI of the dcat:Dataset).
Below, you find an overview of all classes of the v2 Health-RI core metadata schema with all possible relations between classes. Jump right to it here.
Property URI: Each property is attributed with an URI (uniform resource identifier), clearly identifying the element and parent ontology from which the property is derived. For example, the property 'Contact point' in the class Catalogue, has the property URI dcat:contactPoint, indicating that concerns the property contact point derived from DCAT vocabulary.
Note that property URIs always start with a small letter, like dcat:contactPoint, while class URIs start with a capital letter, like dcat:Dataset.
Definitions and usage notes: each property has a definition that further specifies the property, as well as a usage note, which describes in more detail how the property should be used. Definitions and usage notes of de v2 Health-RI core metadata schema are available on Github and in the associated Excel sheet.

By providing the metadata of all mandatory (and ideally also recommended) properties of required classes in the Health-RI core metadata schema in the correct format, a data holder makes sure that the metadata conforms to the schema and is machine-and human-readable.

UML diagram

A UML diagram is a visual representation of a metadata schema. The UML of the v2 Health-RI core metadata schema is depicted below in Fig. 4.
A UML is divided by class (the boxes in the UML below), where each box represents a class of the schema. Within each class, the relevant properties are listed with the property URI, the range, requirement level and cardinality.
For example, in the UML below you see the box for dcat:Dataset (class), containing the mandatory property dct:title with range rdfs:Literal and cardinality [1..n]. The dcat:Dataset class also contains the property dcat:distribution with range dcat:Distribution (cardinality [0..n]). As you can see from the capital letter in the range of the property, this property is pointing to another class (dcat:Distribution) also present in the UML. The connection between these classes is also indicated by the open arrow from the dcat:Dataset class to the dcat:Distribution class.
While open arrows indicate connections between classes, closed arrows indicate that a certain class inherits all properties from another class. For example, the dcat:Dataset inherits from dcat:Resource, indicating that all properties from dcat:Resource can also be used in dcat:Dataset. Note that this does not mean that also the values from an instance of another resource are inherited, but only the ('empty') properties.

Nested classes

It is possible that an instance of a class refers to another instance of the same class, e.g. dcat:Catalog pointing to itself via the property dct:hasPart. These nested structures can be used, for example, to describe the structure of an institution or infrastructure in more detail, for example if an institute (described by dcat:Catalog) is divided into several independent departments (each described with its one instance of dcat:Catalog) that produce and publish their own sets of dcat:Dataset.

Please note that in the current technical implementation of the Health-RI core metadata schema in the front end of the National Health Data Catalogue, there is a limit to the theoretically indefinite flexibility that DCAT offers, especially since the National Health Data Catalogue cannot currently display these layers of nested structures. Read more about it here Mapping tutorial | 🚧 Current limitations in model flexibility .

UML diagram of the Health-RI core metadata schema, v2

A high-resolution version of this image is available on Github.

Fig. 4: UML diagram of the Health-RI core metadata schema, v2.
In the UML, we have separated the main classes from supporting classes. Relationships between main classes are indicated with arrows as described above, where mandatory relationships between classes are marked with dark labels, recommended relationships with a lighter colour. Relationships with supporting classes are not shown with arrows to keep a better overview in the drawing, but can still be deduced from the pink coloured ranges of the listed properties per class.

Next steps

To map your metadata, you can follow the general tutorial Mapping tutorial. Then the metadata can be transformed into RDF format and exposed using a FAIR Data Point. More information about this step can be found here: 4B Exposing metadata

Additional resources

Technical details on DCAT AP and FAIR Datapoints - Youtube video, Health-RI

Health-RI core metadata schema v1: GitHub - Health-RI/health-ri-metadata at v1.0.1

Health-RI core metadata schema v2: GitHub - Health-RI/health-ri-metadata at v2.0.0

Resources from the EU Open Data Explained, including a general training on metadata and basic and advanced level resourses on DCAT and DCAT-AP.

HealthDCAT-AP literacy portal

FAIR Metrolines (note: some pages under developement):

Metroline Step: Assess availability of your metadata

Metroline Step: Register resource level metadata

Metroline Step: Analyse data semantics

Metroline Step: Apply (meta)data model

https://health-ri.atlassian.net/wiki/spaces/FSD/pages/277839878

Questions?

If you have questions about the onboarding process or would like to learn more. Reach out to our https://www.health-ri.nl/health-ri-servicedesk

servicedesk@health-ri.nl