Metadata onboarding on the National Catalogue

status: in development

 

Introduction

This document aims to guide the data onboarding process, explaining how to publish information about datasets on the National Health Data Catalogue. It's essential to note that this document is dynamic and subject to regular updates to reflect the current state of the catalogue. This documentation is intended for researchers and data holders.

Data onboarding translates into making datasets accessible on the National Health Data Catalogue. By following the steps outlined in this guide, you can ensure that your data is effectively and correctly onboarded and can be made readily available for data users.

What is the National Health Data Catalogue?

The National Health Data Catalogue is an overview of health & life sciences research data in the Netherlands. It contains metadata about the available datasets, meaning it contains the description of the datasets and other resources. This description includes, for example, a date when the dataset was created, the authors, or a URL where the data can be found. The metadata hosted within the National Health Data Catalogue are sourced from a diverse range of origins and domains. These sources can span from electronic records to images, biomaterials, omics data, collections and many more.

The goal of the National Health Data Catalogue is to create an infrastructure for secondary use of data where researchers and other interested parties can find and access cross-domain data relevant to their research. The intent is to harvest currently available data from any health-care and life science domain in the Netherlands.

 

The catalogue's goal is to foster FAIR data principles—making data Findable, Accessible, Interoperable, and Reusable for its users. To delve deeper into the concept of FAIR, you can find more information about it here.

Where does the The National Health Data Catalogue get metadata?

The Catalogue can harvest information from other catalogues, and itself can also be harvested by other catalogues. This means that, once metadata is entered in one catalogue, it automatically becomes available in other catalogues, preventing a data holder from having to enter the metadata manually in every individual catalogue. Ideally the two catalogues can be connected via a FAIR data point that holds information about the data and shares it with the Catalogue (Figure 1).

There are several ways a data holder can onboard their data to the Catalogue. However, firstly the data needs to be properly prepared and described. The Catalogue uses a Health-RI metadata schema based on DCAT v3 and DCAT AP. Currently, this metadata schema uses relatively general and overarching classes and definitions, forming the so-called Core metadata schema. Learn more about the schema here . This core metadata schema will be further expanded in the future (see What is the future of the National Health Data Catalogue)

 

afbeelding-20240227-144157.png
Figure 1. Connection of data (source) to the National Catalogue via an FDP

 

How to onboard your metadata to the Catalogue?

There are several steps needed to publish your metadata on the National Catalogue. Here we show the basic steps. You can find some examples of onboarding and scenarios here. For technical documentation please refer to HRI Github: https://github.com/Health-RI/health-ri-metadata/ .

  1. Make sure you can share your metadata

    • Check that you comply with our Code of Conduct and have all the necessesary rights and permissions

  1. Create a metadata schema of your metadata and map it towards the current HRI metadata schema

  1. Expose your metadata to the catalogue by either

    1. Exposing your local system

    2. Implementing an FAIR Data Point using FDP in a box

    3. Manually add the information about your data to the National Catalogue via a Central FDP

 

afbeelding-20240410-105759.png
An overview of the onboarding process steps and responsibilities

 

 

What is the future of the National Health Data Catalogue?

The current version of the Catalogue allows for the general description of the data and metadata. To allow more domain specific searchability the metadata descriptions will be expanded in the future. We can imagine the metadata as a sunflower where the core represents the common values across domains while each domain has its own petal describing the specific metadata needs of the researchers in each domain (Figure 2). The expansion of metadata will allow researchers to find data relevant to their research.

 

In the future, a request tool will be connected to the Catalogue. This tool will allow researchers and other users to request access to datasets they find relevant. The request will be processed and reviewed centrally in a secure environment and users will be able receive answers on their queries in case of federated analysis.

You can find more information about the intended structure and availabilities here.

 

  • New Spaces