...
In this section, we describe the basics of metadata and explain what metadata mapping is. We also look at the Health-RI Core Metadata Schema and the metadata standards it builds upon. This page is intended for a general audience. For details on the standards and the schema, please visit the github specifications dedicated for data experts or data stewards https://github.com/Health-RI/health-ri-metadata/ .
...
Below is an example of simple metadata of a blood a sample. It describes the important information about the sample including ID of the sample, ID of the patient, and a diagnosis:
...
Property | Property Label | Description/Example |
dcat:dataset | dataset | Personalised RISk-based MAmmascreening Study (PRISMA) |
dct:description | Description | This catalog describes the core metadata of Radboudumc datasets |
dct:publisher | Publisher | |
dct:title | Title | Radboudumc Core Metadata |
dcat:ContactPoint | Contact Point | foaf:agent |
dct:creator | Creator name | foaf:agent |
dct: description | Description | The primary aim of the PRISMA study was to investigate the potential value of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples. |
dct:issued | Issued | 15/01/2024 |
dct: identifier | Identifier | https://fdp.radboudumc.nl/dataset/8793226e-9a7c-4e8c-9cef-fce41ef0b865 |
dct:modified | Modified | 15/01/2024 |
dct:publisher | Publisher | foaf:agent |
dcat:theme | Theme | |
dct:title | Title | Personalised RISk-based MAmmascreening Study (PRISMA) |
dct:type | Type | |
dct:license | License | Not yet available |
dcat:accessURL | Access URL | DOI (not yet available) |
dcat:MediaType | Format | text/csv |
dcat:title | Title | PRISMA Questionnaire data |
dcat:description | Description | The extensive questionnaire covers different topics such as demographics, personal characteristics, reproductive characteristics, medication, lifestyle, health status, family history, psychosocial characteristics. |
Here is the same data mapped towards the DCAT-AP standard as a datasetHealth-RI metadata core. It contains the same information and adds some mandatory variables like description. However, however, now this data can be easily processed by a computer and is in a format that is common for many places on the web.
Code Block |
---|
@prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <><https://fdp.radboudumc.nl/dataset/8793226e-9a7c-4e8c-9cef-fce41ef0b865> a dcat:Dataset ; dct:identifiertitle "BS001Personalised RISk-based MAmmascreening Study (PRISMA)" ; dct:title "Blood Sample" ; dct:description "Metadata for a blood sampledescription "The primary aim of the PRISMA study was to investigate the potential value of risk-tailored versus traditional breast cancer screening protocols in the Netherlands. Data collection took place between 2014-2019, resulting in ∼67,000 mammograms, ∼38,000 surveys, ∼10,000 blood samples and ∼600 saliva samples." ; dct:issued "2024-01-15T08:30:0015"^^xsd:dateTimedate ; dct:publisher "Lab Technician, Sarah Lee":identifier <https://fdp.radboudumc.nl/dataset/8793226e-9a7c-4e8c-9cef-fce41ef0b865> ; dct:modified "2024-01-15"^^xsd:date ; dct:subject "Hypertension"publisher <https://ror.org/05wg1m734> ; dcat:landingPagetheme "https<http://example.com/blood_sample"purl.obolibrary.org/obo/MONDO_0007254>, <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C20116> ; dct:type <http://purl.org/dc/dcmitype/Dataset> ; dcatdct:accessRightslicense "InformedNot consentyet obtainedavailable" ; dcat:theme "HealthaccessURL <doi:not_yet_available> ; dcat:MediaType "text/csv" ; dcat:keyword "Blood sample", "Hypertension", "CBC", "Lipid Panel", "Glucose Test" ;ContactPoint [ a foaf:Agent ; dcat foaf:temporalname "2024-01-15T08:30:00/2024-01-15T10:00:00"^^xsd:dateTime ;"Radboudumc" ; vcard:hasEmail <mailto:contact@radboudumc.nl> ] ; dcat:Distribution [ dct:title "PRISMA Questionnaire data" ; dct:description "The extensive questionnaire covers different topics such as demographics, personal characteristics, reproductive characteristics, medication, lifestyle, health status, family history, psychosocial characteristics." ; dcat:hasVersionmediaType "1.0text/csv" ; dcat:conformsToaccessURL <https://eurl.link/dcat-ap><doi:not_yet_available> ] . |
To map your metadata you first need to understand the structure of your metadata and their semantic meaning and the ontology (vocabulary) used to to describe your data in a Resource Description Framework (RDF), in our case DCAT V3, format. The general outline of the mapping pipeline can be found here: https://health-ri.atlassian.net/wiki/spaces/FSD/pages/edit-v2/290291734?draftShareId=ff45a2e2-80ee-49aa-b6d6-c04dedb6f9f8
...
Once your RDF data is ready, you can publish it to FAIR Data Point, where it can be harvested by the Catalogue. More information about this step can be found here: 3. Exposing metadata
Additional resources
Technical details on DCAT AP and FAIR Datapoints - Youtube video, Health-RI
HRI Github - You can find recourses and examples on the Health-RI metadata Github.
Resources from the EU Open Data Explained, including a general training on metadata and basic and advanced level resourses on DCAT and DCAT-AP.
FAIR Metrolines (note: some pages under developement):
Metroline Step: Register resource level metadata
Metroline Step: Analyse data semantics
...