Disclaimer: This Metroline Step focuses solely on the registration of metadata. It does not cover the technical details of metadata schemas or FAIR Data Point, both of which will be detailed in subsequent Metroline Steps.

Short description 

‘Perfectly good data resources may go unused simply because no one knows they exist. There are many ways in which digital resources can be made discoverable, including indexing.’ (GO FAIR)

Metadata is essential for describing information about your resource, whether it is a dataset, article, software, report or other project outputs. In this chapter, we explain how to make metadata about your resources available online so others can find it. As explained in A Generic Workflow for the Data FAIRification Process, this step will help you make your data resources more Findable by registering them in a searchable repository, such as a metadata catalogue.

Metadata catalogues are platforms that store and help you find information about various resources. They allow you to search for existing data relevant for your research, saving time in data collection, and enable others to find your work, thereby increasing collaboration opportunities. Examples of metadata catalogues include the Health-RI Data Catalogue for healthcare and life sciences data and the BBMRI-NL catalogue for biosample information.

Unlike data repositories like Zenodo or DANS, metadata catalogues do not store the actual resource, but just information about it. Metadata catalogues can link directly to the resource’s location, for example by linking your metadata catalogue entry to your DANS entry through its URL, or let others request access via a contact point or data access forms. Many data repositories also act as metadata catalogues, blending the functions of both. For example, when you publish data in DANS, you provide metadata (Title, Description, Keywords) that helps catalog and find entries within the DANS portal. This blurs the line between metadata catalogues and data repositories (see figure below). Both concepts can also be illustrated by platforms like Google Scholar, which works as a metadata catalogue by indexing information about publications that, then, links each entry to external repositories like Elsevier or PLOS where the actual publications can be accessed.

For more information about Data Repositories, see Archiving data | Health-RI and Open Science | ERC (europa.eu).

Purple Peach Minimalist Marketing Tips Venn Diagram (1).png

The key advantage of using metadata catalogues is that you don’t need to publish your resource, such as data, beforehand. This can be very useful if your project has just started data collection or if you have very restrictive data access conditions, but do wish for others to be able to find you. For example, registries keeping data about Rare Disease patients may want to be contacted for the purposes of diagnostic and therapy discovery, without making their actual data available in a repository. If you later decide to publish your resource in a (data) repository for long-term preservation and archiving, you can update the metadata catalogue entry with this new information.

There are other advantages to using metadata catalogues, which we’ll explore in the next section. We’ll also explain why this step is important and how to choose the right metadata catalogue for your resources.

Why is this step important 

Metadata catalogues are critical for making research resources, such as data, more visible and accessible. They offer a range of benefits to data holders, users and the broader scientific community.

Benefits for data holders.

Benefits for data users.

Benefits for the scientific community.

How to 

Registering resource level metadata depends on the context of your project and your expertise in metadata and FAIR principles. Here, we present a generic workflow applicable to most scenarios, but it is advisable to customise this workflow to accommodate your context. This workflow emphasises selecting appropriate metadata catalogues for resources, rather than the technical aspects of metadata schemas.

Step 1 - Inventorise resource types

The first step is to identify and categorise the specific types of resources you are managing. While there is no universally accepted standard list, typical examples of resource types include datasets, code and articles. Within the category of datasets, there are further distinctions such as sociodemographic data, clinical data, imaging data, omics data, and biobank data. The type of resource impacts your choice of metadata catalogue.

Outcome: a list of relevant resources along with their respective types.

Example

Researcher Eva wants to document metadata for her resource, the PRISMA study, and decides to follow the steps on this page. After reviewing the first step, she identified and categorised her resource types as follows:

Step 2 - Determine metadata elements for each resource type

In this step, you need to define the conceptual units of information, known as metadata elements, and collect those elements in a spreadsheet per resource type. Below is an example spreadsheet to capture the resource (sub)type and metadata element with description.

Resource type

Resource subtype

Metadata

Description

Dataset

Lab data

Collection methods

Description of the method or instruments used to collect the data.

Date sources

Information about where or from whom the data was collected.

Python code

Contributors

Names or IDs of other individuals who contributed to the code.

Questions to consider:

Outcome: a list of metadata elements tailored for each resource type.

Example

Having categorised the resource as a dataset, Eva sought to determine which metadata elements would benefit each of her resource types and subtypes. She created a table to organise this information:

Resource type

Resource subtype

Metadata

Dataset

Title

Description

Keywords about datasets

Associated project

Contact point to grant access to datasets

Dataset

Biosample

Type of study

Disease studied

Material collected

Number of donors

Associated publications

Associated biobanks

Access policy of samples

Dataset

Questionnaire data

Information about tools or questionnaires for data collection

Collection mode (face-to-face, telephone, or online)

Sample design (random, stratified, or cluster)

Time Period

Population studied

Survey questions

To ensure comprehensive metadata, she referred to existing standards:

Step 3 - Search for metadata catalogues per resource type 

The next step is to identify metadata catalogues. Platforms like FAIRsharing help researchers locate appropriate metadata catalogues.

Considerations for Metadata Catalogue selection:

More criteria can be found on:

Outcome: A list of Metadata Catalogue candidates for each resource type.

Example

After determining the necessary metadata elements for PRISMA data in step 2, Eva consulted a data steward in her department. Together, they compiled a list of available metadata catalogues. Because a repository has cataloguing functionality, existing repositories were also considered.

Step 4 - Determine metadata catalogues 

In this step, you will evaluate the pros and cons of Metadata Catalogue candidates and make the decisions most appropriate for your context. The general suggestion is to prioritise community standards (R1.3) - is there a metadata catalogue which is widely used in your community?

Outcome: A finalised list of metadata catalogues for each resource type.

Example

Eva selected the appropriate metadata catalogues for each type of dataset.

Step 5 - Enter resource metadata required in the selected metadata catalogues

The final step involves entering the metadata for each resource into the chosen metadata catalogues, following the specific instructions provided by each metadata catalogue. If a resource is registered in multiple metadata catalogues, ensure that the metadata is consistent across all platforms and that the metadata sets are interlinked where possible. Automated updates of metadata are recommended when available.

Outcome: Successfully registered resource-level metadata in a FAIR manner, ensuring it is Findable, accessible, interoperable, and reusable.

Example

Eva followed the instructions for onboarding data in the Health-RI data catalogue to register the metadata, which is now available.

Eva entered the biosample data into the BBMRI Catalogue Form, which she downloaded, and submitted it to the Health-RI Service Desk. The metadata for biosample data were successfully registered, see https://directory.bbmri-eric.eu/ERIC/directory/#/collection/bbmri-eric:ID:NL_RB:collection:155?search=PRISMA.

Expertise requirements for this step 

The level of expertise required for this step will depend on several factors:

Depending on these variables, selecting the appropriate metadata catalogue may be a straightforward process or may require input from multiple experts. Experts that may need to be involved, as described in Metroline Step: Build the Team, are described below.

Practical examples from the community 

The Netherlands ME/CFS Cohort and Biobank Consortium

The Netherlands ME/CFS Cohort and Biobank (NMCB) consortium, in partnership with patient organizations, is leading the way for the development of a national research infrastructure for Myalgic Encephalomyelitis and Chronic Fatigue Syndrome (ME/CFS).

The current choices of metadata catalogues are:

These decisions are described in the first version of FAIR Implementation Profile - NMCB FIP, and the release of next version will be in October 2024.

Training

Relevant training will be added in the future if available.

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.