Metroline Step: Transform and expose FAIR (meta)data

status:Status: IN development

‘Start with a great quote from, for example, a paper, between single quotes, in italic.' (source as a hyperlink between parenthesis)

In layman’s terms (Jip en Janneke), add an easy to follow summary, using around three sentences.

Short description

This step focuses on the processes required to transform existing metadata, already mapped to a metadata schema (see previous step). It provides detailed guidance on how to expose metadata to meet these standards, ensuring that data can be easily discovered and utilized by others. By exposing metadata in a FAIR manner, the research resource (e.g. Dataset) is made findable and accessible to a wider audience through appropriate channels and platforms such as the Health-RI Catalog.

This step also emphasizes the impact of adhering to the FAIR principles and the standards that compile them. It offers practical advice and tools to help users achieve compliance, thereby enhancing the reusability and interoperability of their data. This step is essential for those looking to improve the visibility and accessibility of their metadata, ensuring that it can be effectively matched and reused within the scientific community and beyond.

Transform data:
FAIR in a box - csv -> rdf part (add link)
open refine - transformation part
Castor EDC - swag template

Store and expose data:
Triple store in a FAIR data point

Transform, store and expose metadata:
FAIR data point reference implementation

The next step in the process is to make your data and metadata linkable, i.e. transform them to a machine readable knowledge graph representation [Generic]. Currently, this is done using Semantic Web and Linked Data technologies [GOFAIR_process]. An example of a linkable machine-readable global framework is the Resource Description Framework (RDF). It provides a common and straightforward underlying model and creates a powerful global virtual knowledge graph [Generic]. To transform the metadata and data into this linkable representation requires the semantic models defined in step X and step Y respectively. See the How to section for practical information.

Once the data and metadata have been transformed, they can be made available for further use by humans and machines via e.g. APIs, RDF triplestores, or Web applications [Generic]. One method for owners/publishers to expose the semantically-rich metadata and provide/manage access to it is the FAIR Data Point (FDP) [FDP, FDP_spec]. FDPs are generally used to expose the metadata of datasets, but metadata for other types of digital resources, such as ontologies, algorithms, etc, can also be exposed, thereby allowing consumers to discover information about the resources offered. An FDP has access control functionality, providing the possibility to restrict access to (parts of) the metadata.

The RDF-transformed data can be stored in a triplestore [De Novo], such as GraphDB [GraphDB] and Blazegraph [Blazegraph]. The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point.

It is essential to properly define your access conditions – see Step X.

[Blazegraph] https://blazegraph.com/

[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y

[FDP] https://direct.mit.edu/dint/article/5/1/184/113181/The-FAIR-Data-Point-Interfaces-and-Tooling

[FDP_spec] https://fairdatapoint.readthedocs.io/_/downloads/en/latest/pdf/

[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification

[GOFAIR_process] https://www.go-fair.org/fair-principles/f2-data-described-rich-metadata/

[GraphDB] http://graphdb.ontotext.com/

Why is this step important

By completing this step, you will have FAIRified your (meta)data and exposed it to the world.

How to

[Generic]

In order to transform the data into a machine-readable form (Step 5a) the semantic data model defined (or chosen) in Step 4a is required. Specialized tools are available for this process such as the FAIRifier, which provides insight into the transformation process and makes the process reproducible by tracking intermediate steps [6]. Other similar tools are Karma [16], Rightfield [17], and OntoMaton [18].

For the transformation of the metadata into a machine-readable form (Step 5b) the semantic metadata model defined (or chosen) in Step 4b is required. For some generic metadata items there are several tools available that support this transformation process such as the FAIR Metadata Editor [6], CEDAR [19], and BioschemasGenerator. The FAIR Metadata Editor is a free online tool that demonstrates the concept of structuring metadata in a FAIR-supporting way. Good metadata increases the potential to make a resource more findable. We mention two additional mechanisms to increase the findability of a resource. First, we recommend registering a resource in a domain-relevant registry or index, preferably one that strives for FAIR-compliance. Second, to enable indexing of the data set by general purpose Web search engines such as Google, we recommend including Schema.org markup (or a domain specific variant like Bioschemas) for example using the DataCatalog and Dataset profiles.

[De Novo] --> Maybe a recipe for a Castor-based approach?

Step 8 - Set up registry structure in the FAIR Data Point

The available semantic metadata model of the FAIR Data Point specification was used to describe the VASCA registry [4]. This model is based on the DCAT standard. The VASCA registry FAIR Data Point metadata is described in three layers: 1) catalog - a collection of datasets, 2) dataset - a representation of an individual dataset in the collection, and 3) distribution - a representation of an accessible form of a dataset, e.g. a downloadable file or a web service that gives access to the data for authorised users (Figure S2). A catalog may have multiple datasets, and a dataset may have multiple distributions. The VASCA registry described in this project (Registry of Vascular Anomalies - Radboud university medical center) is one of the datasets in the catalog (Registry of Vascular Anomalies). Other VASCA registries, from this or one of the other centers can also be described in this catalog. The semantic metadata model of the FAIR Data Point metadata specification was implemented in the Castor EDC’s FAIR Data Point. The metadata that describe the catalog, dataset, and distributions of the VASCA registry described in this project, are publicly available and licensed under the CC0 license.

Step 11 - Entered data is automatically transformed to RDF

When the data is entered in the eCRF, it is automatically and in real-time converted into a machine-readable RDF representation by the data transformation application. Thus, the data is made machine-readable from the moment it is being collected: de novo FAIRification. This way, a periodic, manual conversion of the data into machine-readable language is not required, resulting in all data collected being available for reuse at any time. Also, updates in the semantic data model lead to automatic updates in the machine-readable RDF representations of data already collected. An additional benefit of this approach is that the people tasked with clinical care and data entry do not need this knowledge to generate FAIR data.

Step 12 - Entered metadata is automatically transformed to RDF in the FAIR Data Point

When the metadata is entered in the FAIR Data Point of the EDC system, it is represented in a human-readable format (a website, e.g. https://fdp.castoredc.com/fdp/catalog/vasca), and at the same time automatically converted into a machine-readable RDF representation, (e.g. the ttl format: https://fdp.castoredc.com/fdp/catalog/vasca?format=ttl).

Step 13 - Store RDF data and make it available in the FAIR data point

After transforming the eCRF data into a machine-readable RDF representation (step 11), it is stored in a triple store. This is done via the data transformation application upon data entry (collected or updated) in the EDC system (step 10). The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point as an access URL in the Distribution layer (Figure S2).

The How to section should:

be split into easy to follow steps;
- Step 1 - Title of the step
- Step 2 - Title of the step
- etc.
help the reader to complete the step;
aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;
be a general, widely applicable approach;
if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;
aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;
contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;
contain custom recipes/best-practices written by/together with experts from the field if necessary.

Expertise requirements for this step

Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.

Practical examples from the community

Examples of how this step is applied in a project (link to demonstrator projects).

Training

If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”

Suggestions

This page is under construction. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.

Health-RI data