Metroline Step: Apply (meta)data model

Status: ready for review

A meta(data) model is intended to ‘answer questions about a domain, improve understanding, and promote knowledge sharing; expose […] assumptions about a domain; promote communication among people developing a conceptual model, or among people who (later) use a conceptual model' (On the Philosophical Foundations of Conceptual Models)

Think of your data like a book in a library. A metadata model is like the card in the catalogue that tells people what the book is about and who wrote it. A data model is like the book’s table of contents—it helps everyone understand what’s inside and how to read it. Using both makes it easier for people and machines to find, understand, and reuse your data.

Short description

This step provides guidance on how to apply a metadata model to describe research resources (e.g. a model describing the topic, provenance, or type of datasets), and how to apply a data model to describe the information contained in those resources (e.g. a model capturing the structure and semantics of cohort data). Usually, both metadata and data models are annotated with ontologies (e.g. DCAT for metadata, ORDO for data). This page outlines the differences between metadata and data models, their benefits, and the main methods and tools available to apply them effectively.

Metadata models describe information about a resource. For instance, who created it, when it was collected, and what it is generally about (theme). These models help structure information like catalogues of datasets and their distributions, and also properties like authorship, licensing, and contact points. Using a metadata model to describe your resource improves its findability, allowing others to assess its potential for reuse.

Data models describe the structure and meaning of the actual content within the resource—for instance, how patient age, diagnosis, or lab results are represented and interrelated in a dataset. These models ensure that data values are understandable, interoperable, and reusable across different systems and domains. Applying a data model to your resource increase its clarity (i.e. based on the clear definitions of variables, units, and relationships provided by data models), consistency (i.e. as well-defined structures, data models reduce ambiguity and misinterpretation) and interoperability (i.e. ontologised datasets can be easily integrated with other resources).

While some methods and tools focus specifically on either metadata or data, others have a general purpose and can be used for both layers. Typically, a tool for applying a model works by transforming or restructuring original data based on the model’s schema, and then annotating the restructured data with ontological terms.

For example, for a tool like FAIR-in-a-box (see also table below), this may involve reading data from a CSV file and transforming it into RDF that is compliant with a domain-specific model like the CARE-SM (Clinical And Registries Entries Semantic Model) data model. In such cases, each element in the file—whether a column header or data value—is linked to a formal concept from the model. This enables the resource to be understood not just by humans but also by machines. A similar example involving a different approach is the use of an electronic data capture platform such as Castor. Here, users can configure mappings between data elements in the electronic case report form and a semantic data model. This is further described in this paper.

Many metadata-focused tools also provide mechanisms to expose your resource online under clear access conditions, increasing its findability. This may include generating metadata records in RDF and publishing the metadata via catalogues or registries that support standard protocols such as SPARQL.

Why is this step important

Applying the data model to your data and metadata model to your metadata is crucial for the next step: Metroline Step: Transform and expose FAIR (meta)data. It is a central step in the FAIRification process, in which you connect your (meta)data to elements of your (semantic) (meta)data model. By doing this, your (meta)data becomes machine-readable and interoperable.

The metadata and data that are structured with ontologies and follow standard schemas make it easier for other resources to find your resource’s metadata and understand its data.

How to

The following tools provide support in applying a (meta)data model to your resource(s).

Tool	Description	Metadata	Data	Key features	Ref.

Tool	Description	Metadata	Data	Key features	Ref.
FAIR Data Point	A FAIR Data Point (FDP) exposes metadata according to the FAIR principles. The FDP reference implementation automatically includes the DCAT metadata schema, so when deploying this standard FDP, there will be a metadata schema in place. This original metadata schema can be customised to a specific metadata schema by updating the shacls. For example, by making the FDP compliant to the Health-RI metadata schema, your metadata can be exposed to the National Health Data Catalogue in the correct format.	✔		Expose metadata in a FAIR manner Customisable to your own metadata model Machine-readable export formats (e.g., ttl, XML, JSON-LD)	FAIR Data Point Reference Implementation Health-RI FDP Schema Tool GitHub Importing shacls in the FDP
FAIR-in-a-box	FAIR-in-a-box (adopted from CDE-in-a-box) is an automated tool to help make your data FAIR by enabling you to provide a CSV, containing your data in accordance with the embedded CARE-SM model, a model to represent healthcare patient information. The tool will transform your CSV into RDF and place it in a triple store connected to a FAIR data point. It is possible the CARE-SM model does not fit your data. Luckily, FAIR-in-a-box is customisable: if you want to use another semantic model, you can edit the scripts and YARRRML that transform the CSV into RDF with that model.	✔	✔	Connects different software applications to automate the flow from creation, storing and publishing common data elements Automatic update of the metadata whenever something changes in the data Customisable to your own (meta)data model	FAIR-in-a-box Github CDE-in-a-box
Castor EDC	Castor EDC is a cloud-based electronic data capture platform designed to support clinical research by enabling structured and standardised data collection. It facilitates FAIR data practices by supporting metadata annotation, interoperability standards (like CDISC and HL7 FHIR), and easy export to machine-readable formats. Researchers can design studies using a visual interface while embedding semantic annotations to enhance reusability. Castor contains a built-in FAIR layer that includes a pre-configured FAIR Data Point model. Using Castor, you can configure a semantic model to automatically convert eCRF data into FAIR-compliant RDF.	✔	✔	Structured data capture with form builder and controlled vocabularies Metadata annotation to support data findability and understanding Standards support, including CDISC, HL7 FHIR, and ODM for interoperability Machine-readable export formats (e.g. CSV, JSON, CDISC ODM XML) for accessibility and reusability	CastorEDC Paper
Ontotext Refine	With the Ontotext Refine tool, structured data can be manually mapped to a locally stored RDF schema in GraphDB. The visual user interface provides guidance in choosing the right predicates and types, defining datatypes and implementing transformation.	✔	✔	Provides an intuitive visual interface for mapping existing data to RDF schemas Allows for transformation of data from various formats into structured RDF data Customisable to your own (meta)data model	Ontotext Refine
Molgenis	MOLGENIS is a data platform for researchers to accelerate scientific collaborations and for bioinformaticians who want to make researchers happy. Its latest version, the MOLGENIS EMX2 FAIR scientific data platform, is the world's most customisable FAIR platform to find, capture, exchange, manage, analyse, and share data. MOLGENIS is free, open source, and simple to use.	✔	✔	Precisely model your data as a schema of tables, columns, and relationships Automatically generates a complete database application with advanced data entry forms, powerful data up/download options, and flexible query tools Fully customisable data structure, user interface, and layout Coders can plug in scripts, use PostgreSQL, GraphQL, batch web services or RDF interface to query/update the data, and use VueJS to create your own 'apps'	Website Codebase Publication

Expertise requirements for this step

Applying metadata and data models effectively requires a mix of domain knowledge and technical skills. Ideally, teams should involve:

Domain experts, who understand the research content and can guide the correct interpretation of terms and relationships.
(Meta)Data modellers and ontologists, who can help select or customise the right models and ontologies to fit the data.
Data engineers or FAIR data stewards, who are familiar with tools, knowledge representation formats, and transformation workflows.

While basic metadata tasks (e.g. filling in a dataset description using a web form) may not require a deep technical knowledge, more advanced modelling—such as mapping raw data to ontological terms or generating knowledge graphs—typically benefits from the experience described above.

Practical examples from the community

VASCA registry

Implemented the CDE semantic data model and implemented the DCAT metadata schema and EJPRD metadata schema.
See for more information two publications:
- The de novo FAIRification process of a registry for vascular anomalies
- De-novo FAIRification via an Electronic Data Capture system by automated transformation of filled electronic Case Report Forms into machine-readable data

PRISMA

Implemented the Health-RI metadata schema in the FDP to apply the Health-RI core metadata schema to metadata from the PRISMA study. You can view PRISMA metadata in the Radboudumc FAIR Data Point.

Training

If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”

Suggestions

This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.