Metroline Step: Apply (meta)data model

Status: IN development

Renamed: “Metroline Step: Apply core metadata model” to “Apply (meta)data model”

Short description 

This step provides a comprehensive guide on how to apply a (meta)data model to research resources, e.g. Datasets. It emphasizes the importance of metadata in making data more discoverable (findable), accessible, and reusable. The page outlines the standards and protocols that should be followed, increasing consistency and interoperability. It also includes a step-by-step implementation guide, detailing the necessary tools and resources to effectively apply the metadata model. It also provides links to external resources with step-by-step approaches and examples of projects that successfully implemented a metadata schema and/or data model to their data.

 

Community examples:
VASCA registry - implemented the CDE semantic data model and implemented the DCAT metadata schema and EJPRD metadata schema.
PRISMA - implemented Health-RI metadata schema

 

Implement a data model (in development)

-FAIR in a box

-CastorEDC

-openRefine (manually)

 

Implement a metadata schema

-FAIR data point reference implementation (implements DCAT)

-Health-RI FAIR data point (implements Health-ri metadata schema)

Step-by-step for Health-RI

Metadata implementation (add link)

-mapping metadata schema page (add link)


Implement a data model (in development)

A. FAIR in a box: from CDE-in-a-box

comes with CARE-SM model. If you want to use your own custom model, you have to adjust the YARRRML model. (via Matey).

Components/steps:

  1. RML: RFD mapping language. reusable templates that support not only CSV to RDF transformations, but also transformations from other formats. RML templates specify individual triple patterns that should be created during a transformation. Eg: The subject Uniform Resource Identifier (URI), predicate URI, and object URI are represented as strings that may contain variables, where the variables are references to locations within the source document (e.g., the appropriate column header within a CSV file). During a transformation, every variable in an RML template is replaced by the value of that location within a single source record (e.g., a single row of a CSV file) and then the source is iterated over all records to complete the transformation. RML templates themselves are represented in RDF and are therefore not always easily human-readable. With the aim of simplifying the RML syntax, such that our EJP RD FAIRification stewards, or potentially the registry data custodians themselves, could edit the template if required, we identified a second, related technology – YARRRML

  • YARRML - Linked data generation rules, generated in a human readable way. YARRRRML docs can be converted into RML template, which are them applied to a CSV to automate transformation. Specify rules to transform data to linked data (eg. triples). +CSV → automated mapping CSV to RDF according to rules (specified by RML)

  • So you need a template-compliant CSV (generated by data castodian, together with FAIR data steward in this paper)

  1. transforming a non-RDF data format into RDF with RML: 2 optional tools: SDM-RDFizer (alternative in this paper: RMLMapper) - next Metroline step

  2. Fed into GraphDB (triple store). Metadata automatically updated when data updates

  3. Exposed to FDP (default, can use templates for controlled input FDP) - covered in next Metroline step

From FAIR in a Box Github:

The EJP-RD CARE-SM Transformation process has three steps:

  1. A simple "preCARE" CSV file is created by the data owner (you must do this!)

  2. The preCARE.csv is transformed into the final CARE.csv (this is automated) by the caresm toolkit (part of the docker-compose)

  3. The final CARE.csv is processed by the YARRRML transformer, and RDF is output into the ./data/triples folder

Sources: Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data - Journal of Biomedical Semantics

The FAIR Data Point: Interfaces and Tooling

Mapping of clinical trial data to CDISC-SDTM: a practical example based on APPROACH and ABIRISK

B. CastorEDC

  • Build eCRF for data collection (previous step?)

  • Map fields from eCRF to semantic model (step 7 of de novo) in the 'data transformation application'

  • eCRF values linked to the ontology concepts used as a machine-readable

    representation of the value in the rendered RDF

  • Entered data automatically converted into RDF in real time (next step) = de novo FAIRification

C. ontoText refine: map structured data to a locally stored RDF schema in GraphDB. Chose right predicates and types, define datatype and implement transformations. Integrated in GraphDB workbench.

  1. Load ontology in GraphDB

  2. Connect ontoRefine to graphDB (which has ontology)

  3. Load your data

  4. Transform data to your needs and wishes

  5. Connect data variables to ontology manually. Tadaah: RDF

Protege: ontology editor

 

Implement a metadata schema (in development)

-FAIR data point reference implementation (implements DCAT)

-Health-RI FAIR data point (implements Health-ri metadata schema)

  • with SHACLS HRI schema you can deliver metadata according to HRI schema

  • automatically transformed to RDF

Step-by-step for Health-RI

Importing shacl files in the FDP

  1. Log in as an admin in the FDP and go to “Metadata schemas” (top right corner)

  2. Click on the metadata schema you want to update

  3. Go to the GitHub page providing the shacls

  4. Click on the class you want to update and copy the ttl file

  5. Go back to the FDP and paste the ttl file in “Form definition” (bottom of the page)

  6. Click on “Save and release”

  7. Update the version number

  8. Click on “Release”

Metadata implementation (add link)

-mapping metadata schema page (add link)

 

 

FAIRopoly 

This step aims at implementing the semantic model for data through an automatic tool, and the metadata model for metadata. The metadata and data that are structured with ontologies and follow standard schemas make it easier for other resources such as the EJP RD Virtual Platform to find your resource’s metadata and understand its data. 

Tip: EJPRD developed a metadata model, it may require a developer to implement it in your registry source code. 

 

To check:

According to FAIRopoly this should be step 8 in de novo (set up registry structure in FDP) and step 12 (??) in generic . What is the content of these steps?

 

 

De novo supplementary

Step 8 - Set up registry structure in the FAIR Data Point

The available semantic metadata model of the FAIR Data Point specification was used todescribe the VASCA registry [4]. This model is based on the DCAT standard. The VASCA registryFAIR Data Point metadata is described in three layers: 1) catalog - a collection of datasets, 2)dataset - a representation of an individual dataset in the collection, and 3) distribution - arepresentation of an accessible form of a dataset, e.g. a downloadable file or a web service thatgives access to the data for authorised users (Figure S2). A catalog may have multiple datasets,and a dataset may have multiple distributions. The VASCA registry described in this project(Registry of Vascular Anomalies - Radboud university medical center) is one of the datasets inthe catalog (Registry of Vascular Anomalies). Other VASCA registries, from this or one of theother centers can also be described in this catalog. The semantic metadata model of the FAIRData Point metadata specification was implemented in the Castor EDC’s FAIR Data Point. Themetadata that describe the catalog, dataset, and distributions of the VASCA registry describedin this project, are publicly available and licensed under the CC0 license

 

 

Why is this step important 

Applying the data model to your data and metadata model to your metadata is crucial for the next step: Metroline Step: Transform and expose FAIR (meta)data.

How to

The How to section should:

  • be split into easy to follow steps;

    • Step 1

    • Step 2

    • etc.

  • help the reader to complete the step;

  • aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;

  • be a general, widely applicable approach;

  • if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;

  • aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;

  • contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;

  • contain custom recipes/best-practices written by/together with experts from the field if necessary. 

Expertise requirements for this step 

Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.

FAIR expert/data steward: help with the tools.

Practical examples from the community 

Examples of how this step is applied in a project (link to demonstrator projects).  

Training

Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.