Metroline Step: Transform and expose FAIR (meta)data
Status: IN development
‘Start with a great quote from, for example, a paper, between single quotes, in italic.' (source as a hyperlink between parenthesis)
In layman’s terms (Jip en Janneke), add an easy to follow summary, using around three sentences.
Short description
This step focuses on the processes required to transform existing metadata, already mapped to a (meta)data schema (see Metroline Step: Apply (meta)data model ). It provides detailed guidance on how to expose (meta)data to align with standards such as RDF and DCAT, ensuring that data can be easily discovered and utilized by others. This makes the research resource (e.g. Dataset) more findable and accessible to a wider audience through appropriate channels and platforms, such as the National Health Data Catalog.
This step also emphasizes the impact of adhering to the FAIR principles and the standards that compile them. It offers practical advice and tools to help users achieve compliance, thereby enhancing the reusability and interoperability of their data. This step is essential for those looking to improve the visibility and accessibility of their data, ensuring that it can be effectively matched and reused within the scientific community and beyond.
Why is this step important
By completing this step, you will have transformed your data into a FAIR enabling format, that includes the model applied in the previous step, and exposed it to the world according to your access conditions.
How to
Step 1 - Transform Data
Before exposing data in a FAIR manner, it must be transformed into a machine-readable format using semantic models. This typically means converting tabular or structured data into RDF using appropriate tools. The choice of tool depends on the source format and technical skills of the user.
Examples of data transformation tools:
FAIR in a box - CSV -> RDF: A toolkit to convert CSV files into RDF using a step-by-step interface. It can be used to implement CARE-SM.
OpenRefine - RDF Transform extension - uses a graphical user interface (GUI) for transforming OpenRefine project data to RDF-based formats. OpenRefine can import a variety of file types, including tab separated ( tsv ), comma separated ( csv ), Excel ( xls , xlsx ), JSON, XML, RDF as XML, and Google Spreadsheets.
Castor EDC - swag template
Step 2 - Store and expose data
After data is transformed into RDF, it must be stored in a way that ensures it can be efficiently queried, accessed, and reused. This is typically achieved by using RDF triplestores, such as GraphDB and Blazegraph. which are databases specifically designed to store and retrieve semantic data using the SPARQL query language. These platforms allow data to be queried directly via SPARQL endpoints and can serve as the backend for web applications or other services that use semantic data.
To make both the data and its metadata discoverable and accessible, the RDF data can be made available in the FAIR Data Points (FDPs), under agreed data reuse conditions (e.g. authentication required). An FDP provides a standardized way to publish metadata about digital resources organized into three layers: catalog, dataset, and distribution.
Using an FDP in combination with a triplestore enables both human-friendly interfaces and machine-actionable metadata publication. This setup allows users to search for datasets, understand their structure and licensing, and access the underlying data directly.
Examples:
GraphDB + FDP: An institution may use GraphDB to store RDF data and an FDP to publish metadata that references the dataset, providing both open and controlled access through the FDP interface.
Standalone Triplestore with Embedded Metadata: In some cases, RDF data can be hosted in a triplestore that also serves limited metadata. While this supports technical interoperability, it may be less visible in broader FAIR ecosystems without an FDP.
Web APIs with RDF outputs: For simpler use cases or when full triplestore infrastructure is not feasible, RDF data can be exposed through lightweight web APIs that return Linked Data representations.
Choosing the right storage and exposure method depends on the scale of data, intended users, and institutional infrastructure. For most research projects aiming to achieve FAIR compliance, combining a triplestore with a FAIR Data Point offers a robust, standards-aligned solution for sharing data and metadata effectively.
Step 3 - Transform, store and expose metadata
To make metadata FAIR, it must be semantically enriched, machine-readable, and exposed through standardized interfaces. This allows both humans and machines to discover and reuse the metadata across different platforms and domains.
The first step is to transform the metadata into a structured format using common semantic web standards such as RDF, and vocabularies like DCAT, Dublin Core, or domain-specific ontologies. This semantic transformation ensures interoperability and aligns metadata with established community practices.
Metadata transformation tools:
FAIR-in-a-Box – covers both data and metadata (CSV to RDF, metadata templates)
CEDAR Workbench – creates semantically rich metadata, especially templates
FAIR Data Station – supports ISA metadata and outputs RDF
Castor EDC SWAG
FAIR data point reference implementation
Expertise requirements for this step
Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.
Data steward
Infrastructure professional
Practical examples from the community
Examples of how this step is applied in a project (link to demonstrator projects).
Training
If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”
Suggestions
This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.
The next step in the process is to make your data and metadata linkable, i.e. transform them to a machine readable knowledge graph representation [Generic]. Currently, this is done using Semantic Web and Linked Data technologies [GOFAIR_process]. An example of a linkable machine-readable global framework is the Resource Description Framework (RDF). It provides a common and straightforward underlying model and creates a powerful global virtual knowledge graph [Generic]. To transform the metadata and data into this linkable representation requires the semantic models defined in step X and step Y respectively. See the How to section for practical information.
Once the data and metadata have been transformed, they can be made available for further use by humans and machines via e.g. APIs, RDF triplestores, or Web applications [Generic]. One method for owners/publishers to expose the semantically-rich metadata and provide/manage access to it is the FAIR Data Point (FDP) [FDP, FDP_spec]. FDPs are generally used to expose the metadata of datasets, but metadata for other types of digital resources, such as ontologies, algorithms, etc, can also be exposed, thereby allowing consumers to discover information about the resources offered. An FDP has access control functionality, providing the possibility to restrict access to (parts of) the metadata.
The RDF-transformed data can be stored in a triplestore [De Novo], such as GraphDB [GraphDB] and Blazegraph [Blazegraph]. The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point.
It is essential to properly define your access conditions – see Step X.
[Blazegraph] https://blazegraph.com/
[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y
[FDP] https://direct.mit.edu/dint/article/5/1/184/113181/The-FAIR-Data-Point-Interfaces-and-Tooling
[FDP_spec] https://fairdatapoint.readthedocs.io/_/downloads/en/latest/pdf/
[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification
[GOFAIR_process] https://www.go-fair.org/fair-principles/f2-data-described-rich-metadata/
[GraphDB] http://graphdb.ontotext.com/
Why is this step important
By completing this step, you will have FAIRified your (meta)data and exposed it to the world.
How to
[Generic]
In order to transform the data into a machine-readable form (Step 5a) the semantic data model defined (or chosen) in Step 4a is required. Specialized tools are available for this process such as the FAIRifier, which provides insight into the transformation process and makes the process reproducible by tracking intermediate steps [6]. Other similar tools are Karma [16], Rightfield [17], and OntoMaton [18].
For the transformation of the metadata into a machine-readable form (Step 5b) the semantic metadata model defined (or chosen) in Step 4b is required. For some generic metadata items there are several tools available that support this transformation process such as the FAIR Metadata Editor [6], CEDAR [19], and BioschemasGenerator. The FAIR Metadata Editor is a free online tool that demonstrates the concept of structuring metadata in a FAIR-supporting way. Good metadata increases the potential to make a resource more findable. We mention two additional mechanisms to increase the findability of a resource. First, we recommend registering a resource in a domain-relevant registry or index, preferably one that strives for FAIR-compliance. Second, to enable indexing of the data set by general purpose Web search engines such as Google, we recommend including Schema.org markup (or a domain specific variant like Bioschemas) for example using the DataCatalog and Dataset profiles.
[De Novo] --> Maybe a recipe for a Castor-based approach?
Step 8 - Set up registry structure in the FAIR Data Point
The available semantic metadata model of the FAIR Data Point specification was used to describe the VASCA registry [4]. This model is based on the DCAT standard. The VASCA registry FAIR Data Point metadata is described in three layers: 1) catalog - a collection of datasets, 2) dataset - a representation of an individual dataset in the collection, and 3) distribution - a representation of an accessible form of a dataset, e.g. a downloadable file or a web service that gives access to the data for authorised users (Figure S2). A catalog may have multiple datasets, and a dataset may have multiple distributions. The VASCA registry described in this project (Registry of Vascular Anomalies - Radboud university medical center) is one of the datasets in the catalog (Registry of Vascular Anomalies). Other VASCA registries, from this or one of the other centers can also be described in this catalog. The semantic metadata model of the FAIR Data Point metadata specification was implemented in the Castor EDC’s FAIR Data Point. The metadata that describe the catalog, dataset, and distributions of the VASCA registry described in this project, are publicly available and licensed under the CC0 license.
Step 11 - Entered data is automatically transformed to RDF
When the data is entered in the eCRF, it is automatically and in real-time converted into a machine-readable RDF representation by the data transformation application. Thus, the data is made machine-readable from the moment it is being collected: de novo FAIRification. This way, a periodic, manual conversion of the data into machine-readable language is not required, resulting in all data collected being available for reuse at any time. Also, updates in the semantic data model lead to automatic updates in the machine-readable RDF representations of data already collected. An additional benefit of this approach is that the people tasked with clinical care and data entry do not need this knowledge to generate FAIR data.
Step 12 - Entered metadata is automatically transformed to RDF in the FAIR Data Point
When the metadata is entered in the FAIR Data Point of the EDC system, it is represented in a human-readable format (a website, e.g. https://fdp.castoredc.com/fdp/catalog/vasca), and at the same time automatically converted into a machine-readable RDF representation, (e.g. the ttl format: https://fdp.castoredc.com/fdp/catalog/vasca?format=ttl).
Step 13 - Store RDF data and make it available in the FAIR data point
After transforming the eCRF data into a machine-readable RDF representation (step 11), it is stored in a triple store. This is done via the data transformation application upon data entry (collected or updated) in the EDC system (step 10). The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point as an access URL in the Distribution layer (Figure S2).
The How to section should:
be split into easy to follow steps;
Step 1 - Title of the step
Step 2 - Title of the step
etc.
help the reader to complete the step;
aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;
be a general, widely applicable approach;
if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;
aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;
contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;
contain custom recipes/best-practices written by/together with experts from the field if necessary.
Expertise requirements for this step
Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.
Practical examples from the community
Examples of how this step is applied in a project (link to demonstrator projects).
Training
If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”
Suggestions
This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.