Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status
colourRed
titlestatus: in development

Table of Contents
maxLevelnone
maxLevel6
minLevel16
include
outlinefalse
indentstyle
excludeWarning:
stylenone
typelist
classprintablefalse
class

📌 Introduction

In this section, we describe the basics of metadata and explain what metadata mapping is. We also look at the Health-RI Core Metadata Schema and the metadata standards it builds upon. This page is intended for a general audience. For details on the standards and the schema, please visit the github specifications dedicated for data experts or data stewards https://github.com/Health-RI/health-ri-metadata/tree/master .

🧠 What is metadata

Metadata is essentially data about data. It provides information that describes various aspects of your data, such as its description, the owner of the data, the type of data. In other words, metadata helps understanding and managing data effectively by providing additional information about it.

Metadata serves as the backbone of effective data management and analysis in the life sciences and healthcare domains. It enables researchers, clinicians, and policymakers to derive meaningful insights from vast amounts of data while ensuring its integrity, reliability and confidentiality, by provide a standardization, interoperability and machine readabilty of the shared metadata.

🔎 Metadata standards

A metadata standard is a set of rules, guidelines and conventions that define how metadata should be structured, formatted and described within a particular domain or context. Adhering to such standards ensures consistency, interoperability and effective management of metadata across different systems, organizations and disciplines.

...

DCAT-AP: DCAT Application Profile for Data Portals in Europe is a metadata standard developed by the European Commission to facilitate the interoperability of data catalogs and portals across European countries. It builds upon the DCAT (Data Catalog Vocabulary) standard and extends it with additional requirements and recommendations tailored to the European context.

🎯 HRI Metadata Schema

The National Health Data Catalogue currently works with a Core Metadata Schema. This Core Metadata Schema is a formal shared conceptualisation of the requirements to find and reuse information across Health-RI nodes via the National Catalogue. It represents a set of minimal elements for describing each resource (including dataset) with common metadata. The current version of the Core Metadata Schema includes DCAT v3, HealthDCAT-AP and some selected DCAT-AP mandatory classes and their definitions. You can find the relation of the Health-RI metadata schema to other application profiles here: Relation of the Health-RI metadata schema to other DCAT application profiles

The set is split into several classes describing the data. At the moment four classes (Dataset, Catalog, Resource, and Agent) are mandatory. Each class is populated by a set of mandatory and recommended variables. You can find all of the descriptions of variables and classes here: Core Metadata Schema Specification

HRICoreMetadataSchemaReleasePlateau1.jpg

📋 What is metadata mapping

Info

Metadata mapping and creation of a metadata schema will likely require involvement of a semantic expert, data steward or equivalent.

...

Below is an example of metadata from the PRISMA study. It contains information about the data available:

Class

Property

Property Label

Description/

Example

dcat:

Catalogue

dcat:dataset

dataset

Personalised RISk-based MAmmascreening Study (PRISMA)

Catalog

dct:description

Description

This catalog describes the core metadata of Radboudumc datasets

The primary aim of the PRISMA study is to investigate the potential value of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples.

dct:publisher

Publisher

https://ror.org/05wg1m734

foaf:Agent

dct:title

Title

Radboudumc Core Metadata

Personalised RISk-based MAmmascreening Study (PRISMA)

dcat:Dataset

dcat:

ContactPoint

contactPoint

Contact Point

foaf

vcard:

agent

Kind

dct:creator

Creator

name

foaf:

agent

Agent

dct: description

Description

The

primary aim of the PRISMA study was to investigate the potential value of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples

extensive questionnaire covers a number of potential breast cancer risk predictors such as demographics, personal characteristics, reproductive characteristics, medication, lifestyle, health status, family history, psychosocial characteristics.

dct:issued

Issued

15/01/2024

Release date

2024-07-02T10:49:07

dct: identifier

Identifier

https://fdp.radboudumc.nl/dataset/

8793226e

37d6ad17-

9a7c

aa35-

4e8c

425c-

9cef

946e-

fce41ef0b865

855838d3f9cc

dct:modified

Modified

15/01/2024

2024-09-09T08:54:32

dct:publisher

Publisher

foaf:

agent

Agent

dcat:theme

Theme

http://

purl

publications.

obolibrary

europa.

org

eu/

obo/MONDO_0007254, http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C20116

resource/authority/data-theme/HEAL

dct:title

Title

Personalised RISk-based MAmmascreening Study (PRISMA)

PRISMA Questionnaire data

dct:

type

license

Type

License

http

https://

purl

data.ru.

org

nl/

dc

doc/

dcmitype/Dataset

dct:license

License

Not yet available

dua/RUMC-RA-DUA-1.0.html

dcat:Distribution

dcat:accessURL

Access URL

DOI (not yet available)

dcat:

MediaType

mediaType

Format

https://www.iana.org/assignments/media-types/text/

csv 

csv

dcat:title

Title

PRISMA Questionnaire data - CSV format

dcat:description

Description

The

extensive questionnaire covers different topics such as demographics, personal characteristics, reproductive characteristics, medication, lifestyle, health status, family history, psychosocial characteristics

questionnaire data in CSV format.

foaf:Agent

foaf:name

name

Radboudumc (Publisher)

dct:identifier

identifier

mailto:contact@radboudumc.nl

https://ror.org/05wg1m734 (Publisher)

vcard:Kind

vcard:hasEmail

has email

firstname.lastname@radboudumc.nl

vcard:hasName

has name

J. Doe

foaf:Agent

foaf:name

name

J. Doe (Creator)

dct:identifier

identifier

https://orcid.org/0000-0000-0000-0000 (Creator)

Here is the same data mapped towards the Health-RI metadata core. It contains the same information, however, now this data can be easily processed by a computer is machine readable and is in a format that is common for many places on the web.

Code Block
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Catalog description
<https://fdp.radboudumc.nl/catalogue/catalogue>prisma>
    a dcat:Catalog ;
    dct:title "Radboudumc Core MetadataPersonalised RISk-based MAmmascreening Study (PRISMA)" ;
    dct:description "This catalog describesThe primary aim of the PRISMA study is to investigate the corepotential metadata of Radboudumc datasetsvalue of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples." ;
    dct:publisher [ a foaf:Agent ; foaf:name "Radboudumc (Publisher)" ; dct:identifier <https://ror.org/05wg1m734> ] ;
    dcat:dataset <https://fdp.radboudumc.nl/dataset/8793226e37d6ad17-9a7caa35-4e8c425c-9cef946e-fce41ef0b865>855838d3f9cc> .

# Dataset description
<https://fdp.radboudumc.nl/dataset/8793226e37d6ad17-9a7caa35-4e8c425c-9cef946e-fce41ef0b865>855838d3f9cc>
    a dcat:Dataset ;
    dct:title "PersonalisedPRISMA RISk-based MAmmascreening Study (PRISMA)Questionnaire data" ;
    dct:description "The primaryextensive aimquestionnaire ofcovers thea PRISMAnumber study was to investigate the potential value of risk-tailored versus traditional of potential breast cancer screeningrisk protocolspredictors insuch theas Netherlands.demographics, Datapersonal collectioncharacteristics, tookreproductive placecharacteristics, between 2014-2019medication, lifestyle, resultinghealth in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples."status, family history, psychosocial characteristics." ;
    dct:issued "2024-07-02T10:49:07"^^xsd:dateTime ;
    dct:issuedmodified "2024-01-1509-09T08:54:32"^^xsd:datedateTime ;
    dct:identifier <https://fdp.radboudumc.nl/dataset/8793226e37d6ad17-9a7caa35-4e8c425c-9cef946e-fce41ef0b865>855838d3f9cc> ;
    dct:modified "2024-01-15"^^xsd:datecreator [ a foaf:Agent ; foaf:name "J. Doe (Creator)" ; dct:identifier <https://orcid.org/0000-0000-0000-0000> ] ;
    dct:publisher [ a foaf:Agent ; foaf:name "Radboudumc (Publisher)" ; dct:identifier <https://ror.org/05wg1m734> ] ;
    dcat:theme <http://purlpublications.obolibraryeuropa.orgeu/obo/MONDO_0007254>, <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C20116>resource/authority/data-theme/HEAL> ;
    dct:typelicense <http<https://purldata.ru.orgnl/dcdoc/dcmitype/Dataset>dua/RUMC-RA-DUA-1.0.html> ;
    dctdcat:license "Not yet available"distribution <https://fdp.radboudumc.nl/distribution/csv> ;
    dcat:distributioncontactPoint [
        a dcatvcard:DistributionKind ;
        dcatvcard:accessURLhasEmail <doi:not_yet_available><mailto:firstname.lastname@radboudumc.nl> ;
        dcatvcard:mediaTypefn "text/csvJ. Doe"
;    ] .

# Distribution details (CSV)
dcat:title "PRISMA Questionnaire data" ;
  <https://fdp.radboudumc.nl/distribution/csv>
    a dcat:Distribution ;
    dcat:accessURL dcat:description "The extensive questionnaire covers different topics such as demographics, personal characteristics, reproductive characteristics, medication, lifestyle, health status, family history, psychosocial characteristics."
    ] ;
    dcat:ContactPoint [
   <doi:not_yet_available> ;
    dcat:mediaType <https://www.iana.org/assignments/media-types/text/csv> ;
    dcat:title "PRISMA Questionnaire data - CSV format" ;
    dcat:description "The questionnaire data in CSV format." .

# Agent description (Publisher)
<https://ror.org/05wg1m734>
    a foaf:Agent ;
        foaf:name "Radboudumc (Publisher)" ;
        vcard:hasEmail <mailto:contact@radboudumc.nl>
    ] .

dct:identifier <https://ror.org/05wg1m734> .

# Agent description (Creator)
<https://rororcid.org/05wg1m734>0000-0000-0000-0000>
    a foaf:Agent ;
    foaf:name "RadboudumcJ. Doe (Creator)" ;
    dct:identifier <https://rororcid.org/05wg1m734>0000-0000-0000-0000> .

To map your metadata you first need to understand the structure of your metadata and their semantic meaning and the ontology (vocabulary) used to to describe your data in a Resource Description Framework (RDF), in our case DCAT V3, format. The general outline of the mapping pipeline can be found here: https://health-ri.atlassian.net/wiki/spaces/FSD/pages/edit-v2/290291734?draftShareId=ff45a2e2-80ee-49aa-b6d6-c04dedb6f9f8

(tick) Next steps

After mapping/transforming your data properties to the classes and variables of the HRI model, you need to validate your model. This step ensures that the new model both accurately represent the original data as well as adheres to the HRI metadata structure.

Once your RDF data is ready, you can publish it to FAIR Data Point, where it can be harvested by the Catalogue. More information about this step can be found here: 4B Exposing metadata

Additional resources

Technical details on DCAT AP and FAIR Datapoints - Youtube video, Health-RI

HRI Github - You can find recourses and examples on the Health-RI metadata Github. 

Resources from the EU Open Data Explained, including a general training on metadata and basic and advanced level resourses on DCAT and DCAT-AP.

FAIR Metrolines (note: some pages under developement):

Metroline Step: Register resource level metadata

Metroline Step: Analyse data semantics

Metroline Step: Apply (meta)data model

Metroline Step: Create or reuse a semantic (meta)data model

Panel
panelIconId1f5e8
panelIcon:speech_left:
panelIconText🗨️
bgColor#E6FCFF

Questions?

If you have questions about the onboarding process or would like to learn more. Reach out to our https://www.health-ri.nl/health-ri-servicedesk

📧 servicedesk@health-ri.nl

...