/
Metroline Step: Register record level metadata

Metroline Step: Register record level metadata

Status: IN development

Structural metadata are data about how a dataset or resource came about, but also how it is internally structured.' (How to FAIR)

This step focuses on making the descriptors of your resource's content - known as structural metadata - publicly available. With numerous platforms and methods at your disposal, this guide will help you to make a selection of which repositories to use.

Short description

Record level metadata, also introduced in previous pages as structural metadata (Metroline Step: Assess availability of your metadata), includes information about what a resource looks like, its format, and, if applicable, the code systems and values applied. Similar to how an index at the beginning of a book refers to all its chapters and pages, it gives the reader an idea about the content. For data resources, this is typically referred to as the codebook or data dictionary. You can find more information about how to generate one on Metroline Step: Analyse data semantics .

Once you create this codebook, you should consider making it available to ensure clarity, usablity, and trust in your dataset. This step will ensure that you can make a selection of data repositories that fit your research domain. Moreover, this step will point to tools that you can use to further increase FAIRness of your structural metadata.

Why is this step important 

Registering structural metadata and their definitions is crucial for ensuring the effective reuse and harmonization of data structures across various research. This step enhances the contextual understanding of individual records, making data more meaningful and accessible, and improves the precision in data retrieval.

In other words, this step:

  • Reduces ambiguity: Harmonizes data elements, by providing definitions to each variable, which makes it easier to combine both existing and new datasets, as data alone might be meaningless or misinterpreted.

  • Provides structural context: Explains how variables are related to one another and describes the format of the data file(s).

  • Increases findability: When variables are made publicly available, they allow more specific queries to be made about the resource.

  • Enhances reusability: Describes the resource with multiple metadata properties, which allows data to be reused in the future by yourself or by other projects with different research objectives.

  • Improves reproducibility: Metadata about the structure of your data provides insight into how the original data was structured, even if the original data is no longer available.

Keep in mind your FAIRification objectives - registering record level metadata can facilitate the subsequent steps in the FAIRification process. For example, proper metadata is essential for applying Common Data Elements and creating a semantic model (Metroline Step: Apply common data elements , Metroline Step: Analyse data semantics ).

How to 

Step 1. Make sure your codebook is ready

For more information on how to create a codebook, see the following pages: Metroline Step: Analyse data semantics and https://ddialliance.org/create-a-codebook

The information you need to provide in your metadata are largely dictated by the community you are part of and can vary significantly between research-domains. 

  • For instance, in microscopy research, an image is linked to details about the individual from whom a tissue sample was obtained. Here, characteristics such as age, sex, and diagnosis of that person serve as metadata about the sample, offering context about the microscopy image. 

  • However, in patient registry or clinical studies, this same information is considered the primary data, with their structural metadata including only information about which variables are collected and the value ranges without specifying the individual record (e.g., Age, captured as a xs:integer ).

In the figure below you see an example of a codebook in the clinical research domain (adopted from https://faircookbook.elixir-europe.org/content/recipes/interoperability/c4c-clinical-trials.html ).

Example codebook from FAIR Cookbook

Output: the researcher created a codebook that is ready to be published in a data repository. 

 

Step 2. Select the most appropriate data repository

Once the codebook is ready, you can look for places where it can be stored/shared. Some selection criteria might be:

  • Use domain-specific repositories where possible (a repository that is commonly used in your research domain). If you don’t know what repositories are used in your domain, you can make use of https://www.re3data.org/  or https://fairsharing.org/ ) to browse through repositories. Many filters can be applied (e.g., generation of a persistent identifier, selecting a country, data access level,  etc.), making it possible to select a data repository that fits your needs.

    • If there is no domain-specific repository, you can make use of generic or institutional repositories (e.g., DANS Data Stations Life Sciences, Zenodo, Radboud Data Repository for Radboudumc researchers, or DataverseNL for Amsterdam UMC, UMC Utrecht, and Erasmus MC).

  • Select a repository that makes it possible to publish documentation open access, even if the data is under restricted access. This way researchers can get an idea of what your data looks like, before actually needing to access the data.

  • If possible, try to choose a repository that supports publishing metadata (or the codebook) in machine-readable formats, such as JSON/XML/RDF

  • Keep in mind the funder requirements – do they require you to publish in a specific repository?

  • Some journals recommend specific repositories for researchers to use (e.g., Recommended Repositories | PLOS One or Data Repository Guidance | NATURE Scientific Data)

Output: The researcher has selected the data repository where the codebook can be published (alongside the data). 

 

Step 3. Publish the structural metadata (codebook) in the selected data repository

This step involves publishing the structural metadata in the selected data repository. Make sure that the codebook will be published open access, even when the data is under restricted access. Some repositories that support this are DANS Data Stations Life Sciences (generic repository), Radboud Data Repository, and DataverseNL (institutional repositories).

Output: The codebook is published open access in a data repository.

 

Step 4 - Turn your Structural Metadata machine-actionable

Tip: This particular step might require onboarding of a data steward to assist with the process.

To fully comply with FAIR Data Principles, it’s not enough to just generate a codebook in excel or word, because this still does not mean that machines can understand the content and act on it. There are specific file types that accomplish that, and that you can use to convert your initial file from step 1 into. Those languages are, for instance, RDF, JSON-LD, OWL or XML. Apart from structuring your metadata in accordance with those languages, it’s also important to populate as much as possible the elements of your metadata with vocabularies that follow the FAIR data principles themselves. Here’s an example of a vocabulary that meets those requirements:

 

FAIR-term

Denotes the Concept

http://purl.obolibrary.org/obo/HP_0001324

Muscle Weakness

 

Using “FAIR terms” in your metadata is beneficial because it reduces ambiguity, since most of these terms have detailed descriptions. They are also described via persistent identifiers (i.e., the element is represented by a URL), resolved via a standardized communication protocol (in this case, the normal HTTP internet protocol that allows you to open the website when you click the link above), and are represented in a knowledge representation that establishes the hierarchy of this element. This makes sure that the term complies with principles of Findability, Accessibility and Interoperability, respectively.

There are many that provide tools to support transforming your data dictionary. Depending on your research, you may want to opt for different techniques, here’s a non-exhaustive list of tools to keep in mind:

  • Clinical studies and patient registries: ART-DECOR could be a solution to make structural metadata available. With this tool you can publish metadata about the structure of your dataset, this information is possible to extract and is represented in XML (see example here).

  • Survey and/or any tabular datasets (rows and columns): DDI have implemented tools that can convert your codebook into RDF or XML. Ideally, you should make this file available in appropriate repositories that can host and interpret these structures, but otherwise, they can be made available alongside your data following the instructions above.  

  • Biology and Omics Studies: FAIR data Station (Link) is used in research across both of these fields. Using this tool you can generate a template that serves your metadata by selecting which variables apply to your resource. The tool then validates your codebook against the elements you selected, and produces an RDF file.

If you have experience using other(s) please leave a comment!

Output: Following this step and using any of the provided tools, the output should be a .ttl, .json, or .xml file, structured with the information previously contained in your codebook.

 

Expertise requirements for this step 

This step requires following experts as described in the Metroline Step: Build the team step.

  • Researcher: Provide the metadata/codebook.

  • Data Steward: Find solutions, and implement them or support the Researcher in the implementation.

Practical examples from the community 

ERDRI.mdr - Central Metadata Repository

In the Rare Disease community, central metadata repositories like https://eu-rd-platform.jrc.ec.europa.eu/mdr/search/ are available for registering record level (or structural) metadata about patient registries. ERDRI.mrd contains a collection of metadata entries, where data elements are described with definitions, units of measurement, and values. The idea is that the more comprehensively registries define their data elements on this platform, the easier it will be to reuse data for broader studies and research questions. 

Other examples: Demonstrator portfolio | Health-RI

Training

If you have great suggestions for training material, add links to these resources here. Since the training aspect is still under development, currently many steps have “Relevant training will be added soon.”

Suggestions

This page will be developed in the future. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.

 

 

Related content