/
Mapping pipeline
  • Rough draft
  • Mapping pipeline

    status: in development

     

    Introduction

    In this section, we describe the process of metadata mapping and the steps you should take. This page is intended for data stewards, data experts, or equivalent roles. For a general overview, please refer to our general Metadata mapping overview: https://health-ri.atlassian.net/wiki/pages/createpage.action?spaceKey=FSD&title=2.%20Metadata%20mapping&linkCreation=true&fromPageId=290291734

     

    General metadata mapping pattern

     

     

    model.png

     

    1. Understand your data and metadata to be onboarded

    Before starting the mapping process, it is crucial to understand the structure of your metadata and the semantic meaning of each column.  

    Next, you need to extract and curate the metadata from the dedicated databases at source. The output of this step is metadata that is sourced, cleaned, wrangled, and ready to go through the transformation pipeline. 

    Also, in this step you decide how each piece of data relates to RDF concepts like classes, properties, and entities. 

    Each file (e.g., CSV or JSON) describes a dataset, resource, image or sample.  

    Each row in the CSV can be mapped to the target properties and target classes in the Core Metadata Schema https://github.com/Health-RI/health-ri-metadata/tree/master .  

     

    2. Understand the ontology (DCAT)

    An ontology defines the vocabulary (classes, properties, etc.) used to describe your data in RDF.  

    In our case, we use DCAT v3 for transformation purposes and DCAT-AP for evaluation purposes. DCAT-AP is a constraint model, which helps to understand which fields are mandatory and other constraints.  

    This step is vital for ensuring interoperability and making your data understandable and reusable by others. 

     

    3. Define URIs for each row

    Determine what each row in your CSV represents.  

    The URI acts as a unique identifier for resources in the RDF world. 

     

    4. Map Columns to Properties

    Each column in the CSV usually corresponds to a property of your primary resources. Map each column to an RDF property defined in your ontology.

    For instance, a column named "title" might map to a property such as dcat:title in the dcat:Resource class. 

     

    5. Convert Values

    Transform the values in each cell into RDF literals or resources, depending on their nature. For literal values (e.g., names, descriptions), use the cell's content directly. For values representing relationships or references to other entities, you will need to create or use existing URIs, linking to controlled vocabularies.

     

    6. Use a Mapping Language or Tool

    Several languages and tools can automate the mapping process from CSV to RDF, such as: 

     

    7. Create RDF Triples

    Using the mappings you have defined, generate RDF triples for each row in your file. Each triple consists of a subject (the resource URI), a predicate (the property URI), and an object (the value or another resource URI). 

    8. Validate and Refine

    After converting your data, validate the RDF output to ensure it accurately represents your original data and adheres to the ontology's structure. You may need to refine your mappings or data to correct any issues.  

    Health-RI RDF Validator using SHACL shapes can be found here. The GitHub repository is available here. (Note: This repository and all SHACL shapes are still under active development)

    9. Share and Publish your validated metadata graph as FDP

    Once your RDF data is ready, consider how you will share or publish it to make it accessible to your community, for instance, through FDP. This might involve hosting it on a SPARQL endpoint, within a triple store, or through other data publishing platforms. 

     

     

    Additional resources

    HRI shacles: https://github.com/Health-RI/health-ri-metadata/tree/master/Formalisation(shacl)/Core

    Core Metadata Schema Specification

    Example of a metadata graph: https://github.com/Health-RI/health-ri-metadata/tree/master/MapToDCAT-AP/Metadata%20graphs%20-%20Examples

    Example of mapping: https://github.com/Health-RI/health-ri-metadata/tree/master/MapToDCAT-AP/Example

    Image2Catalog: GitHub - Health-RI/img2catalog: Repository for a tool to help make XNAT into a FAIR Data Point This tool queries an XNAT instance and generates DCAT-AP 3.0 metadata.

    Questions?

    If you have questions about the onboarding process or would like to learn more. Reach out to our Health-RI Servicedesk | Health-RI

    servicedesk@health-ri.nl