Short description
The next step in the process is to make your data and metadata linkable, i.e. transform them to a machine readable knowledge graph representation [Generic]. Currently, this is done using Semantic Web and Linked Data technologies [GOFAIR_process]. An example of a linkable machine-readable global framework is the Resource Description Framework (RDF). It provides a common and straightforward underlying model and creates a powerful global virtual knowledge graph [Generic]. To transform the metadata and data into this linkable representation requires the semantic models defined in step X and step Y respectively. See the How to section for practical information.
Once the data and metadata have been transformed, they can be made available for further use by humans and machines via e.g. APIs, RDF triplestores, or Web applications [Generic]. One method for owners/publishers to expose the semantically-rich metadata and provide/manage access to it is the FAIR Data Point (FDP) [FDP, FDP_spec]. FDPs are generally used to expose the metadata of datasets, but metadata for other types of digital resources, such as ontologies, algorithms, etc, can also be exposed, thereby allowing consumers to discover information about the resources offered. An FDP has access control functionality, providing the possibility to restrict access to (parts of) the metadata.
The RDF-transformed data can be stored in a triplestore [De Novo], such as GraphDB [GraphDB] and Blazegraph [Blazegraph]. The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point.
It is essential to properly define your access conditions – see Step X.
[Blazegraph] https://blazegraph.com/
[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y
[FDP] https://direct.mit.edu/dint/article/5/1/184/113181/The-FAIR-Data-Point-Interfaces-and-Tooling
[FDP_spec] https://fairdatapoint.readthedocs.io/_/downloads/en/latest/pdf/
[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification
[GOFAIR_process] https://www.go-fair.org/fair-principles/f2-data-described-rich-metadata/
[GraphDB] http://graphdb.ontotext.com/
Why is this step important
By completing this step, you will have FAIRified your (meta)data and exposed it to the world.
How to
[Generic]
In order to transform the data into a machine-readable form (Step 5a) the semantic data model defined (or chosen) in Step 4a is required. Specialized tools are available for this process such as the FAIRifier, which provides insight into the transformation process and makes the process reproducible by tracking intermediate steps [6]. Other similar tools are Karma [16], Rightfield [17], and OntoMaton [18].
For the transformation of the metadata into a machine-readable form (Step 5b) the semantic metadata model defined (or chosen) in Step 4b is required. For some generic metadata items there are several tools available that support this transformation process such as the FAIR Metadata Editor [6], CEDAR [19], and BioschemasGenerator. The FAIR Metadata Editor is a free online tool that demonstrates the concept of structuring metadata in a FAIR-supporting way. Good metadata increases the potential to make a resource more findable. We mention two additional mechanisms to increase the findability of a resource. First, we recommend registering a resource in a domain-relevant registry or index, preferably one that strives for FAIR-compliance. Second, to enable indexing of the data set by general purpose Web search engines such as Google, we recommend including Schema.org markup (or a domain specific variant like Bioschemas) for example using the DataCatalog and Dataset profiles.
[De Novo] --> Maybe a recipe for a Castor-based approach?
Step 8 - Set up registry structure in the FAIR Data Point
The available semantic metadata model of the FAIR Data Point specification was used to describe the VASCA registry [4]. This model is based on the DCAT standard. The VASCA registry FAIR Data Point metadata is described in three layers: 1) catalog - a collection of datasets, 2) dataset - a representation of an individual dataset in the collection, and 3) distribution - a representation of an accessible form of a dataset, e.g. a downloadable file or a web service that gives access to the data for authorised users (Figure S2). A catalog may have multiple datasets, and a dataset may have multiple distributions. The VASCA registry described in this project (Registry of Vascular Anomalies - Radboud university medical center) is one of the datasets in the catalog (Registry of Vascular Anomalies). Other VASCA registries, from this or one of the other centers can also be described in this catalog. The semantic metadata model of the FAIR Data Point metadata specification was implemented in the Castor EDC’s FAIR Data Point. The metadata that describe the catalog, dataset, and distributions of the VASCA registry described in this project, are publicly available and licensed under the CC0 license.
Step 11 - Entered data is automatically transformed to RDF
When the data is entered in the eCRF, it is automatically and in real-time converted into a machine-readable RDF representation by the data transformation application. Thus, the data is made machine-readable from the moment it is being collected: de novo FAIRification. This way, a periodic, manual conversion of the data into machine-readable language is not required, resulting in all data collected being available for reuse at any time. Also, updates in the semantic data model lead to automatic updates in the machine-readable RDF representations of data already collected. An additional benefit of this approach is that the people tasked with clinical care and data entry do not need this knowledge to generate FAIR data.
Step 12 - Entered metadata is automatically transformed to RDF in the FAIR Data Point
When the metadata is entered in the FAIR Data Point of the EDC system, it is represented in a human-readable format (a website, e.g. https://fdp.castoredc.com/fdp/catalog/vasca), and at the same time automatically converted into a machine-readable RDF representation, (e.g. the ttl format: https://fdp.castoredc.com/fdp/catalog/vasca?format=ttl).
Step 13 - Store RDF data and make it available in the FAIR data point
After transforming the eCRF data into a machine-readable RDF representation (step 11), it is stored in a triple store. This is done via the data transformation application upon data entry (collected or updated) in the EDC system (step 10). The URL providing access to the machine-readable data in the triple store is made available in the FAIR Data Point as an access URL in the Distribution layer (Figure S2).
Expertise requirements for this step
Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.
Practical examples from the community
Examples of how this step is applied in a project (link to demonstrator projects).
Training
Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”