...
To find a dataset in the National Health Data Catalogue, the data dataset needs to be described well. The information that describes the data dataset is called ‘metadata’; the way you structure the metadata and the terms you use is called the ‘metadata schema’. While the HRI core metadata schema, based on the DCAT-AP 3.0, addresses the fundamental elements, it might fall short in correctly fully describing the dataset. Even after expanding this metadata schema with health related terms through integration with HealthDCAT-AP and DCAT-AP NL, it may not fully meet the needs of specific domains. To improve disciplinary dataset discovery we may need domain-specific metadata (see also recommendation 4.4 of the Research Data Alliance document on dataset discoverability).
This document serves as a guide for working groups of different domains to develop their own domain-specific metadata schemas for the National Health Data Catalogue. The guide is described in process steps (see figure below), first building the team, then collecting requirements from the domain and finally turning this into domain-specific metadata schemas that are in line with and extend the HRI core metadata schema. The respective process steps are described in more detail in the subpages with deliverables and examples.
...
For the development of domain-specific metadata schema the focus is on catalogue metadata of datasets; discoverability of datasets (or in general, resources) in the National Health Data Catalogue, and information (metadata) on how to access or reuse these datasets. To a large extent, these discoverability aspects may be covered by DCAT-AP and HealthDCAT-AP as implemented by the HRI core metadata schema. However, domains may have additional wishes for the discoverability of datasets for their domain and that is the scope of a domain-specific metadata schema.
Currently the semantic modeling of the data (points) itself - metadata of data or so-called data modeling (e.g. modeling descriptions and relations between variables, values and records in a dataset) - is out of scope. This kind of data modeling will be picked up in plateau 3 (2025).
The use of RDF (Resource Description Framework) provides a way to represent metadata in machine-readable format and promotes the reuse of existing vocabularies and ontologies, which ensures interoperability across different domains. By using well established standards like DCAT-AP, FOAF and SKOS, you can describe datasets in a way that is consistent with our core HRI metadata schema as well as other domains.
While it is highly encouraged to reuse as many of the already existing terms as possible, RDF also allows the creation of custom properties that meet specific needs of your domain, if no suitable terms exist.
Out of scope is the semantic/conceptual modeling of the data itself (i.e. data modeling, descriptions and relations between variables and values in a dataset). See the example below for the difference between the two.
Metadata vs. data modeling example: Beacons
...
.
Prerequisites
The domain is defined and organized in such a way that a metadata taskforce of that domain has the possibility and mandate to speak for and make decisions with that domain about a domain-specific metadata schema. In general, we consider data source domains (e.g., omics, imaging, clinical data) and disease domains (e.g., oncology, cardiovascular, rare diseases) whether or not further subdivided into subdomains.
Developing the domain-specific metadata schema is primarily the responsibility of the domain with consult and support from the Health-RI hub. Implementation of the schemas is a shared responsibility between the domain and the hub.
...