/
Metroline Step: Apply (meta)data model

Metroline Step: Apply (meta)data model

Status: IN development

Renamed: “Metroline Step: Apply core metadata model” to “Apply (meta)data model”

A meta(data) model is intended to ‘answer questions about a domain, improve understanding, and promote knowledge sharing; expose […] assumptions about a domain; promote communication among people developing a conceptual model, or among people who (later) use a conceptual model' (On the Philosophical Foundations of Conceptual Models)

Think of your data like a book in a library. A metadata model is like the card in the catalogue that tells people what the book is about and who wrote it. A data model is like the book’s table of contents—it helps everyone understand what’s inside and how to read it. Using both makes it easier for people and machines to find, understand, and reuse your data.

Short description 

This step provides guidance on how to apply a metadata model to describe research resources (e.g. a model describing the topic, provenance, or type of datasets), and a data model to describe the information contained in those resources (e.g. a model capturing the structure and semantics of cohort data). Usually, both metadata and data models are annotated with ontologies (e.g. DCAT for metadata, ORDO for data). This page and its subpages outline the differences between metadata and data models, their benefits, and the main methods and tools available to apply them effectively.

Metadata models describe information about a resource. For instance, who created it, when it was collected, and what it is generally about (theme). These models help structure information like catalogues of datasets and their distributions, and also properties like authorship, licensing, and contact points. Using a metadata model to describe your resource improves its findability, allowing others to assess its potential for reuse.

Data models, describe the structure and meaning of the actual content within the resource—for instance, how patient age, diagnosis, or lab results are represented and interrelated in a dataset. These models ensure that data values are understandable, interoperable, and reusable across different systems and domains. Applying a data model to your resource increase its clarity (i.e. based on the clear definitions of variables, units, and relationships provided by data models), consistency (i.e. as well-defined structures, data models reduce ambiguity and misinterpretation)  and interoperability (i.e. ontologised datasets can be easily integrated with other resources).

While some methods and tools focus specifically on either metadata or data, others are general-purpose and can be used for both layers. Typically, a tool for applying a model works by transforming or restructuring original data based on the model’s schema, and then annotating the reestructured data with ontological terms.

For example, this may involve reading data from a CSV file and transforming it into RDF that is compliant with a domain-specific model like CARE-SM. In such cases, each element in the file—whether a column header or data value—is linked to a formal concept from the model. This enables the resource to be understood not just by humans but also by machines.

Many metadata-focused tools also provide mechanisms to expose your resource online under clear access conditions, increasing its findability. This may include generating metadata records in RDF and publishing the metadata via catalogues or registries that support standard protocols such as SPARQL.

Why is this step important 

Applying the data model to your data and metadata model to your metadata is crucial for the next step: Metroline Step: Transform and expose FAIR (meta)data. It is a central step in the FAIRification process, in which your (meta)data will be connected to elements of your (semantic) (meta)data model, such that it becomes machine-readable and interoperable.

The metadata and data that are structured with ontologies and follow standard schemas make it easier for other resources to find your resource’s metadata and understand its data. 

How to

The following tools provide support in applying a (meta)data model to your resource(s).

Tool

Description

Metadata

Data

Key features

Ref.

Tool

Description

Metadata

Data

Key features

Ref.

FAIR Data Point

A FAIR Data Point (FDP) exposes metadata according to the FAIR principles. The FDP reference implementation automatically includes the DCAT metadata schema, so when deploying this standard FDP, there will be a metadata schema in place.

This original metadata schema can be customized to a specific metadata schema by updating the shacls. For example, by making the FDP compliant to the Health-RI metadata schema, your metadata can be exposed to the National Health Data Catalogue in the correct format.

 

  • Expose metadata in a FAIR manner

  • Customizable to your own metadata model

  • Machine-readable export formats (e.g., ttl, XML, JSON-LD)

FAIR-in-a-box

FAIR-in-a-box (adopted from CDE-in-a-box) is an automated tool to help make your data FAIR by enabling you to provide a CSV containing your data in accordance with the embedded CARE-SM model. The tool will transform your CSV into RDF and place it in a triple store connected to a FAIR data point.

The tool is customizable: if you want to use another semantic model, you can edit the scripts and YARRRML that transform the CSV into RDF with that model.

  • Connects different software applications to automate the flow from creation, storing and publishing common data elements

  • Automatic update of the metadata whenever something changes in the data

  • Customizable to your own (meta)data model

CastorEDC

Castor EDC is a cloud-based electronic data capture platform designed to support clinical research by enabling structured and standardised data collection. It facilitates FAIR data practices by supporting metadata annotation, interoperability standards (like CDISC and HL7 FHIR), and easy export to machine-readable formats. Researchers can design studies using a visual interface while embedding semantic annotations to enhance reusability.

  • Structured data capture with form builder and controlled vocabularies

  • Metadata annotation to support data findability and understanding

  • Standards support, including CDISC, HL7 FHIR, and ODM for interoperability

  • Machine-readable export formats (e.g. CSV, JSON, CDISC ODM XML) for accessibility and reusability

Ontotext Refine

With the Ontotext Refine tool, structured data can be manually mapped to a locally stored RDF schema in GraphDB. The visual user interface provides guidance in choosing the right predicates and types, defining datatypes and implementing transformation.

  • Provides an intuitive visual interface for mapping data to RDF schemas

  • Allows for transformation of data from various formats into structured RDF data

  • Customizable to your own (meta)data model

Molgenis

MOLGENIS is a data platform for researchers to accelerate scientific collaborations and for bioinformaticians who want to make researchers happy. Its latest version, the MOLGENIS EMX2 FAIR scientific data platform, is the world's most customizable FAIR platform to find, capture, exchange, manage, analyse and share data. MOLGENIS is free, open source and simple to use.

  • Precisely model your data as a schema of tables, columns and relationships

  • Automatically generates a complete database application with advanced data entry forms, powerful data up/download options and flexible query tools

  • Fully customizable data structure, user interface and layout

  • Coders can plug in scripts, use PostgreSQL, GraphQL, batch web services or RDF interface to query/update the data, and use VueJS to create your own 'apps'

Expertise requirements for this step 

Applying metadata and data models effectively requires a mix of domain knowledge and technical skills. Ideally, teams should involve:

  • Domain experts, who understand the research content and can guide the correct interpretation of terms and relationships.

  • (Meta)Data modellers and ontologists, who can help select or customise the right models and ontologies to fit the data.

  • Data engineers or FAIR data stewards, who are familiar with tools, knowledge representation formats, and transformation workflows.

While basic metadata tasks (e.g. filling in a dataset description using a web form) may not require a deep technical knowledge, more advanced modelling—such as mapping raw data to ontological terms or generating knowledge graphs—typically benefits from the experience described above.

Practical examples from the community 

Examples of how this step is applied in a project (link to demonstrator projects).  

VASCA registry

Implemented the CDE semantic data model and implemented the DCAT metadata schema and EJPRD metadata schema.
See for more information two publications:
- The de novo FAIRification process of a registry for vascular anomalies
- De-novo FAIRification via an Electronic Data Capture system by automated transformation of filled electronic Case Report Forms into machine-readable data


PRISMA

Implemented the Health-RI metadata schema in the FDP to apply the Health-RI core metadata schema to metadata from the PRISMA study. You can view PRISMA metadata in the Radboudumc FAIR Data Point.

Training

If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”

Suggestions

This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.

Related content