Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The outcome of this step should be a set of data elements (variables) with clear and unambiguous semantics, which reflect the information you want to collect or share. Be aware that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in creating the semantic Create or reuse a semantic (meta)data model.

Why is this step important

Several of the steps that follow rely on being familiar with your data. For example, in order to create or reuse your semantic (meta)data model, it is crucial to understand the meaning and relationships of variables.

While performing this step, keep your FAIRification goals in mind. Selecting a relevant subset of the data and driving user questions(s) are connected to a thorough understanding of the data. In other words, if you have a clear idea of your FAIRification goals, it might be easier to define what elements should be present in your (meta)data and how these elements should be represented.

For example, in a dataset a variable to collect sex-related data might be called ‘sex’. If the semantics of such variable is not provided or not analyzed, it would be unclear if it means ‘biological sex at birth', ‘phenotypic sex’, or ‘gender’.

How to

While performing this step, keep your FAIRification goals in mind. since, for example, selecting a relevant subset of the data and defining driving user questions(s) depend on a thorough understanding of the data.These issues have to be solved before you start with the semantic (meta)data model.

How to

Let’s say we have a dataset containing patient information with the following metadata:

...

In the below spreadsheet we can see what the issues are with our current metadata and suggested improvements in order to make the meaning of them clearer.

Metadata Field

Value

Issue

Suggested Value

Suggested description

Dataset Name

Health Data

Generic and not descriptive.

Patient Health Records 2023

Comprehensive dataset containing health records of patients from Hospital A in the year 2023.

Date of Upload

01/02/2023

Ambiguous format
(MM/DD/YYYY or
DD/MM/YYYY).

2023-01-02

Date when the dataset was uploaded, in ISO 8601 format (YYY-MM-DD).

Keywords

BP, HR, Conditions

Abbreviations used without context.

Blood Pressure, Heart Rate, Hypertension

Keywords describing the main topics covered by the dataset.

Creator

Dr. Smith

Generic name without additional identifying information.

Dr. John Smith, Hospital A

ORCID: 0001-0002-3456-7890

Full name and affiliation of the dataset creator, as well as ORCID.

Description

Patient health data including BP and HR

Lacks detail.

Detailed patient health records including measurements of blood pressure (BP) and heart rate (HR), along with diagnosed medical conditions and prescribed medications.

Extended description providing context and details about the dataset.

Format

CSV

Broad category, can be more detailed.

CSV, version 1.0

Data format and version.

Source

Hospital A

Lacks detail, too generic.

Hospital A, Department of Cardiology

Specific department and institution where the data was sourced.

Rights

Open

Too broad.

CC BY 4.0

Licensing terms specifying the rights for data usage.

...

In addition, check whether the data already contains FAIR features, such as persistent unique identifiers for data elements (for more information, see pre-FAIR assessment).

...