Metroline Step: Apply (meta)data model

Status: IN development

Renamed: “Metroline Step: Apply core metadata model” to “Apply (meta)data model”

Short description 

This step provides a comprehensive guide on how to apply a (meta)data model to research resources, e.g. Datasets, making data more findable, accessible, interoperable and reusable (FAIR). The page outlines different tools and protocols (including step-by-step guides) that can be used for applying a (meta)data model, thereby increasing consistency and interoperability. It also provides links to more detailed external resources and examples of projects that successfully implemented a metadata schema and/or data model to their data.

Why is this step important 

Applying the data model to your data and metadata model to your metadata is crucial for the next step: https://health-ri.atlassian.net/wiki/spaces/FSD/pages/277479473. It is a central step in the FAIRification process, in which your (meta)data will be connected to elements of your (semantic) (meta)data model, such that it becomes machine-readable and interoperable.

The metadata and data that are structured with ontologies and follow standard schemas make it easier for other resources to find your resource’s metadata and understand its data. 

How to

Below we outline 3 ways to apply your datamodel, and one guide to apply a metadata model in the FDP

Implement a data model (in development)

A. FAIR-in-a-box

FAIR-in-a-box (adopted from CDE-in-a-box) is an automated tool to help make your data FAIR by enabling you to provide a CSV containing your data in accordance with the embedded CARE-SM model. The tool will transform your CSV into RDF and place it in a triple store connected to a FAIR data point.

The tool is customizable: if you want to use another semantic model, you could potentially edit the scripts and YARRRML that transform the CSV into RDF with that model.

The FAIR-in-a-box or CDE-in-a-box consists of several components (see also Fig. 1 of this paper):

  1. A template-compliant CSV (generated by data castodian, together with FAIR data steward in this paper)

    1. A simple "preCARE" CSV file is created by the data owner

    2. The preCARE.csv is automatically transformed into the final CARE.csv by the caresm toolkit (part of the docker-compose)

  2. RML: RML stands for RDF mapping language. RML technology provides templates that enable CSV (or other) to RDF transformation.
    The RML component in FAIR-in-a-box is created to transform a CSV template consisting data according to the CARE-SM model into RDF.

    1. In case you wish to adjust the FAIR-in-a-box tool to a customized data model, you can use the YARRRML tool to generate a custom RML template

  3. Transforming a non-RDF data format into RDF with RML: 2 optional tools: SDM-RDFizer; this tool uses the RML template and the non-RDF data to transform your data into RDF.

    1. An alternative tool according to this paper: RMLMapper

  4. The transformed data is fed into GraphDB (triple store). Whenever the data updated, the corresponding metadata is automatically updated as well.

  5. Once finished, the metadata is automatically exposed to an FDP (see also the next Metroline step: https://health-ri.atlassian.net/wiki/spaces/FSD/pages/277479473).

Sources: https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-022-00264-6

https://direct.mit.edu/dint/article/5/1/184/113181/The-FAIR-Data-Point-Interfaces-and-Tooling

https://faircookbook.elixir-europe.org/content/recipes/applied-examples/approach-cdisc.html

B. CastorEDC

  • Build eCRF for data collection (previous step?)

  • Map fields from eCRF to semantic model (step 7 of de novo) in the 'data transformation application' (automated or manual??)

  • eCRF values linked to the ontology concepts used as a machine-readable

    representation of the value in the rendered RDF

  • Entered data automatically converted into RDF in real time (next step) = de novo FAIRification

[+ metadata model for castor?]

C. ontoText refine

With this tool, you can manually map structured data to a locally stored RDF schema in GraphDB. To do so, you can chose the right predicates and types, define datatype and implement transformations. The tool is integrated in GraphDB workbench.

In short, the workflow consists of the following steps:

  1. Load your ontology in GraphDB.

    1. You can use Protege to edit your ontology

  2. Connect ontoRefine to GraphDB (where your ontology is stored)

  3. Load your data.

  4. Transform data to your needs and wishes (eg. convert dates to a specific format).

  5. Connect the variables of your data to ontology manually. (This is the actual step where you apply the data model to your data).

  6. You can export your linked data.

 

Implement a metadata schema (in development)

FAIR data point reference implementation (implements DCAT)

Health-RI FAIR data point (implements Health-ri metadata schema)

  • with SHACLS HRI schema you can deliver metadata according to HRI schema

  • automatically transformed to RDF

Step-by-step for Health-RI

Importing shacl files in the FDP

  1. Log in as an admin in the FDP and go to “Metadata schemas” (top right corner)

  2. Click on the metadata schema you want to update

  3. Go to the GitHub page providing the shacls

  4. Click on the class you want to update and copy the ttl file

  5. Go back to the FDP and paste the ttl file in “Form definition” (bottom of the page)

  6. Click on “Save and release”

  7. Update the version number

  8. Click on “Release”

Metadata implementation (add link)

-mapping metadata schema page (add link)

 

The How to section should:

  • be split into easy to follow steps;

    • Step 1

    • Step 2

    • etc.

  • help the reader to complete the step;

  • aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;

  • be a general, widely applicable approach;

  • if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;

  • aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;

  • contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;

  • contain custom recipes/best-practices written by/together with experts from the field if necessary. 

To check:

According to FAIRopoly this should be step 8 in de novo (set up registry structure in FDP) and step 12 (??) in generic . What is the content of these steps?

Expertise requirements for this step 

Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.

FAIR expert/data steward: help with the tools.

Tip: EJPRD developed a metadata model, it may require a developer to implement it in your registry source code.

Practical examples from the community 

Examples of how this step is applied in a project (link to demonstrator projects).  

 

VASCA registry - implemented the CDE semantic data model and implemented the DCAT metadata schema and EJPRD metadata schema.
PRISMA - implemented Health-RI metadata schema

Training

Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.