Mapping tutorial

1 Introduction
2 What is metadata mapping?
3 Current limitations in model flexibility
4 Step-by-step guide for metadata mapping
5 Next steps
6 Additional resources
7 Questions?

Introduction

In this section, we describe the process of metadata mapping and the steps you should take. This page is intended for data stewards, data experts, or equivalent roles. For a general overview and background information, please refer to our general Metadata mapping overview: 4A Metadata mapping

What is metadata mapping?

Metadata mapping and creation of a metadata schema will likely require involvement of a semantic expert, data steward or equivalent.

For the National Health Data Catalogue, metadata mapping is the process of establishing links between your metadata values and the classes and properties of the Health-RI core metadata schema. In other words, it ensures that the metadata for your data is conform to the Health-RI metadata schema.

Mapping involves identifying and linking pieces of metadata information from one system (e.g. your data repository) to the relevant content or data elements in another system (in this case, the Health-RI metadata schema).
For the National Health Data Catalogue, it means that, when mapped correctly to the Health-RI core metadata schema, metadata from your system can seamlessly be harvested from that system and displayed in the catalogue in the correct way.
In the future, we hope to support you with scripts that can automatically transform metadata entered in a CSV template into RDF, ready to be added to the FDP.

After mapping, conformance checking (validation) of the metadata will be performed, to ensure that the metadata is mapped correctly and can be harvested and added to the catalogue. More information on harvesting can be found here: 5. Metadata harvesting

Current limitations in model flexibility

The DCAT-based Health-RI model is theoretically very flexible and has the possibility to establish nested structures. However, we strongly advise against using many-layered nested structures, with for example many layers of nested catalogues. While mapping, always keep the representation of your metadata in the National Health Data Catalogue in mind: currently still only datasets are harvested and displayed, and should therefore be the main class for most detailed metadata descriptions. When using the additional classes, we recommend adhering to the general, overall structure of the v2 model as represented here 4A Metadata mapping | Overview of all core Health RI classes and relations between classes.
We are currently working on updating the representation of metadata in the National Health Data Catalogue to reflect more classes, i.e. also projects, studies, dataset series or catalogues present in the model. Keep it as complex as necessary, but as simple as possible!

If you see that your metadata cannot be mapped to the current metadata model, it might also be that it also lacks certain domain-specific metadata elements. If that is the case, please contact us to inquire about the latest state of certain domain-specific extensions, or if you would like to contribute to their development.

Step-by-step guide for metadata mapping

Whether you plan to provide your metadata via manual entry into an FDP or by automated export, the information (=metadata) on your side has to be mapped correctly to the Health-RI metadata schema for the National Health Data Catalogue in order to be harvested and displayed correctly in the catalogue. To successfully map your metadata to the catalogue, follow the steps below.

Gather the metadata on your resource

The first step is to gather the metadata of the resource(s) that you want to publish in the National Health Data Catalogue. You have to:

Identify where information about your resource is stored,
Extract and evaluate your metadata and potentially
Make the necessary corrections.

For a detailed description of these steps, please refer to the FAIR Metroline Step: Assess availability of your metadata.

Understand the model

After successfully gathering your metadata, but before starting the mapping process, it is crucial to understand the structure of the metadata model that you are mapping to, in this case the Health-RI core metadata model.

Study the following pages to familiarise yourself with the Health-RI metadata schema:

4A Metadata mapping | 🧩 Of which elements does the metadata schema consist?
4A Metadata mapping | Overview of all core Health RI classes and relations between classes: this part gives a general overview of the main entities of the model and how they relate to each other. On this page, we describe each class and the intended use separately.
The UML diagram (containing more details on all classes and properties).

You can find a more detailed description of the v2 metadata schema, including definitions and usage notes of all classes and properties on Github and in the associated Excel sheet.

Once you familiarised yourself with the metadata schema, proceed to the actual mapping (steps 3 and 4). If things are not clear or if you have questions, please contact us via the Health-RI servicedesk or sign-up for one of the weekly onboarding walk-in hours.

Map main concepts to the classes of the metadata schema

In this step you decide how your resource relates to the entities (classes) in the Health-RI core metadata schema. What is the overall structure of your resource and how do the elements of it relate to the classes of the Health-RI core metadata schema?
Specifically for the mapping to the classes of the v2 core metadata shcema, we have a number of considerations, guidelines and recommendations collected on this page: Recommendations on mapping to classes in the v2 core metadata. The page also collects guidelines for specific use cases, such as Cohort and Biobanks data and outlines a number of examples.

Currently, you will always need to describe at least a dcat:Catalog and a dcat:Dataset from the main classes. Additionally, you will need to use foaf:Agent and vcard:Kind from the supporting classes, as these are used by properties in different classes, and also Catalog and Dataset.
Potentially, you might want to describe more, like dcat:Distribution, dcat:DatasetSeries or foaf:Project and disco:Study. Please refer to the recommendations page for more details.

You will most likely not need all available classes in the Health-RI core metadata schema (v2). Some classes are not applicable, e.g. in case an institute wants to describe only the available datasets, they might only use the dcat:Catalog and dcat:Dataset classes. More information and considerations/guidelines for different use cases are described here: Recommendations on mapping to classes in the v2 core metadata .

Map details to properties of selected classes

After establishing which parts of your metadata fit which class of the schema, it is time to fill the details of the selected classes by mapping to their properties.

The current Health-RI model has mandatory and recommended properties. You might want to focus only on the mandatory properties. Keep in mind though, that in order for your data to be findable, it is beneficial to provide as much metadata as possible, by providing the information in the recommended properties as well. For example, the dcat:Dataset class in v2 has a large (~40) number of recommended properties which, when filled, are very informative for data users.

On Github, we provide an Excel sheet with all classes and properties which you can use as a template to collect your metadata. The same information is also available in the tables on the main Github page of the metadata schema. In the future, we hope to support you with scripts that can automatically transform metadata entered in a CSV template into RDF, ready to be added to the FDP.

Whether mapping to mandatory or recommended properties, you have to take the following into account for each property or the detailed explanation of these terms):

Cardinality: is the property mandatory, and how many times can or must it be filled?
Usage notes: These can help to find out how a specific property should be filled or with which level of detail.
Controlled vocabularies: if applicable for the property, chose the correct term from the controlled vocabulary.
Range: Does the value have to be provided in a specific format (e.g. free text or IRI)? For example, email-addresses have to be provided starting with mailto: (e.g. mailto:example@email.com).

Check your model

At this point, we recomend you to validate your model. You can do this by sending the mapped classes and properties to servicedesk@health-ri.nl where one of our metadata specialist will check if the model adheres to the expected standard.

After mapping your metadata to the properties of the selected classes, you are ready to proceed to the next steps.

Next steps

If you have successfully mapped your metadata to the necessary classes and properties of the metadata schema, it is time to convert this metadata mapping into RDF (Resource Description Framework) . Please refer to the next sections for more details: 4B Exposing metadata. The exact steps for this depend on whether you are planning to manually provide your metadata via a FDP (FAIR Data Point), or via automatic upload, and will depend on your local system.

In addition, you will also need to validate your model. This step ensures that the new model both accurately represent the original data as well as adheres to the Health-RI metadata structure.

Once your RDF data is ready and successfully validated, you can publish it to a FDP where it can be harvested by the Catalogue. More information about this step can be found here: 4B Exposing metadata.