Metadata onboarding on the National Catalogue
- 1 Introduction
- 2 What is the National Health Data Catalogue?
- 3 Where does the National Health Data Catalogue get metadata?
- 4 How to onboard your metadata to the Catalogue?
- 4.1 Onboarding navigator
- 4.2 General onboarding steps
- 4.3 1.Request
- 4.4 2. Intake
- 4.5 3. Planning
- 4.6 4. Implementation
- 4.7 5. Harvesting
- 4.8 6. Onboarded
- 5 What is the future of the National Health Data Catalogue?
- 6 Questions?
Introduction
This page describes the background of the National Health Data Catalogue and explains the metadata onboarding process. Onboarding is the process of publishing information (metadata) about resources (e.g. datasets, biomaterials) in the National Health Data Catalogue, which will make them findable for reuse.
In the description of the steps links are provided to detailed guidelines on how to perform the different steps. By following those guidelines you ensure that your data is effectively and correctly onboarded and are finadable for data users.
It is essential to note that this document is dynamic and subject to regular updates to reflect the current state of the catalogue. This documentation is intended for researchers and data holders.
What is the National Health Data Catalogue?
The National Health Data Catalogue is an overview of health & life sciences research data in the Netherlands. It contains metadata about the available datasets, meaning it contains the description of the datasets and other resources. This description includes, for example, a date when the dataset was created, the authors, or a URL where the data can be found. The metadata hosted within the National Health Data Catalogue are sourced from a diverse range of origins and domains. These sources can span from electronic records to images, biomaterials, omics data, collections and many more.
The goal of the National Health Data Catalogue is to create an infrastructure for secondary use of data where researchers and other interested parties can find and access cross-domain data relevant to their research. The intent is to harvest currently available data from any health-care and life science domain in the Netherlands.
The catalogue's goal is to foster FAIR data principles—making data Findable, Accessible, Interoperable, and Reusable for its users. To delve deeper into the concept of FAIR, you can find more information about it here.
Where does the National Health Data Catalogue get metadata?
The Catalogue can retrieve (harvest) information from other catalogues, and it can also be harvested by other catalogues. This means that, once metadata is entered in one catalogue, it automatically becomes available in other catalogues, preventing a data holder from having to enter the metadata manually in every individual catalogue. Ideally the two catalogues can be connected via a FAIR data point that holds information about the data and shares it with the Catalogue (Figure 1).
There are several ways a data holder can onboard their data to the Catalogue. However, firstly the data needs to be properly prepared and described. Resources in the catalogue are described based on the Health-RI core metadata schema, which is based on DCAT-AP v3. The latest version is also based on DCAT-AP NL and HealthDCAT-AP (draft version). A separate page discusses the Relation of the Health-RI core metadata schema to other DCAT application profiles. The specifications of the latest version (version 2) can be found under https://health-ri.github.io/metadata-documentation/.
This core metadata schema will be further expanded in the future (see Metadata onboarding on the National Catalogue | 🎯 What is the future of the National Health Data Catalogue? )
How to onboard your metadata to the Catalogue?
There are several steps needed to publish your metadata on the National Catalogue. Here we show the basic steps. You can find some examples of onboarding and scenarios here. For technical specification please refer to https://health-ri.github.io/metadata-documentation/ for the latest version of the Health-RI core metadata schema.
Onboarding navigator
To help you orient in the documentation, we have developed an Onboarding navigator, that assists you in the decisions and orientation. Go through the decisions or use the overview an checklist to analyse your onboarding situation.
General onboarding steps
1.Request
In this step a data holder/provider reaches out to Health-RI to request onboarding of the metadata, via our service desk: servicedesk@health-ri.nl. A Health-RI contact person is assigned to the data holder/provider and the request is internally registered.
2. Intake
The Health-RI contact person requests the data holder/provider to provide information about the data and resources available. If needed a meeting can be initiated in this stage to provide more detailed information. The Heath-RI contact person also facilitates contact or alignment with other onboarding projects in the same institute or node if possible.
In this step it is also crucial to check if the FAIR pre-requisites and ELSI guidelines are followed. More on the ELSI considerations can be found here: Make sure you can publish your metadata
The first two steps are not required if assistance in the process from Health-RI is not needed.
3. Planning
In this stage the data holder/provider explores the onboarding process and plans a strategy. The resulting strategy should be scalable and ideally usable for multiple data holder/provider when possible (ie. institute level onboarding). 3. Planning
4. Implementation
Upon deciding on a strategy for the onboarding of the (meta)data, the data holder/provider needs to implement the plan. There are two main tasks in this stage that can be done in parallel.
a) Mapping metadata to the HRI metadata schema
In order to onboard metadata the data holder/provider needs to map their local metadata to the metadata schema. You can find general information about the metadata standards and the mapping process in the section bellow.
You can learn more about the metadata and metadata standards here: 4A Metadata mapping
The general mapping pipeline can be found here: Mapping tutorial
For the latest version of the Health-RI core metadata schema please refer to https://health-ri.github.io/metadata-documentation/
b) Exposing metadata
To expose metadata to Health-RI, an intermediate system needs to be in place. The National Health Data Catalogue is using Fair Data Points to harvest information. Basic information on FAIR Data Points can be found here: 4B Exposing metadata .
The FAIR data point should be implemented by the data holder/provider, ideally accompanied with a automated export pipeline (4B_2a Automate export from your local system, 4B_2b Example python code to upload metadata to FDP ).
There are several approaches for implementation of a FAIR Data Point:
Exposing your local system: 4B_1a Expose your local system
Implementing an FAIR Data Point using FDP the reference implementation: 4B_1b FDP reference implementation
Manually add the information about your data to the National Catalogue via a Central FDP: 4B_1c Central FDP
5. Harvesting
To harvest the exposed metadata the data holder/provider contacts the Health-RI service desk with an onboarding request and includes the details of the FAIR Data Point to harvest by sending an email to: servicedesk@health-ri.nl. Health-RI then performs the harvesting.
The metadata is harvested into a testing environment where a check of the data is performed by HRI and the data holder/provider, before reaching the Catalogue. The Catalogue is currently updated daily for changes in the available FDPs, so changes in the metadata can take up to 24 hours to update.
6. Onboarded
If the metadata is approved by the data holder/provider it is then onboarded to the National Health Data Catalogue. If possible, the data holder/provider is asked to share any issues and feedback to the Health-RI contact.
Need help with onboarding?
Feel free to join our weekly Walk-in hour, where one of our colleagues is ready to help you with any issues. To register please fill in this sign-up sheet. For information about the time, as well as the link to join, please see Health-RI agenda https://www.health-ri.nl/nieuws/agenda or contact us via onboarding@health-ri.nl.
We also collect workarounds for common issues in Known issues.
What is the future of the National Health Data Catalogue?
The current version of the Catalogue and the metadata schema allow for general descriptions of health data. However, we realise that there are some shortcomings of the model in describing certain attributes of the data that are currently not possible to do. Based on incoming requests, we are working on careful extensions of the model in order to allow researchers to better describe and find health data in the catalogue.
In the future, a request tool will be connected to the Catalogue. This tool will allow researchers and other users to request access to datasets they find relevant. The request will be processed and reviewed centrally in a secure environment and users will be able receive answers on their queries in case of federated analysis.
You can find more information about the intended structure and availabilities here. You can also follow the latest updates and developments here: Current developmentsarchived .
Questions?
If you have questions about the onboarding process or would like to learn more. Reach out to