Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model, as well as for data collection via, for example, electronic case report forms (eCRFs). 

To understand 'the semantics', different aspects of the data elements/variables should be analysed.

  • The definition/description of data elements. For example, a variable called “sex” could refer to “Biological sex” or “Administrative gender”.

  • Values for choices. For example, in system A, sex allows for male and female, while in system B, sex also allows for intersex. Such difference reflects the gap of their semantics.

  • Relationship between data elements. For example, ‘sex’ the “sex” variable is one attribute of ‘patient’ “patient” profile, which may imply that the semantics of this ‘sex’ “sex” variable is ‘sex “sex of patient’patient”.

The outcome of this step should be a set of data elements (variables) with clear and unambiguous semantics (a codebook), which reflect the information you want to collect or share. Be aware that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in Create or reuse a semantic (meta)data model.

[SdR, how about:]

Understanding and clearly defining the meaning (semantics) of (meta)data is an important step for creating the semantic model, as well as for data collection via, for example, electronic case report forms (eCRFs). In this step, the aim is to ensure you gain a clear and unambiguous understanding of the (meta)data. The step provides guidance for both existing data and data that must still be collected.

To illustrate the issue, consider the example where you receive a dataset with a variable called “sex”. Without clearly defined semantics, it is unclear whether this means “biological sex at birth”, “phenotypic sex”, or “administrative gender”. This must be resolved before you can start with the semantic (meta)data model.

Thus, the outcome of this step is a set of data elements (variables) with clear and unambiguous semantics, known as a codebook. Note that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in Create or reuse a semantic (meta)data model.

...

While performing this step, keep your FAIRification goals in mind. If you have a clear idea of your FAIRification goals, it might be easier to define what elements should be present in your (meta)data and how these elements should be represented.

...

For analysing data semantics:

AnalyseDataSemantic-Data.pngImage RemovedAnalyseDataSemantic-Data.pngImage Added

For analysing metadata semantics:

AnalyseDataSemantic-Metadata_.pngImage RemovedAnalyseDataSemantic-Metadata_.pngImage Added

For easier understanding, we will follow the example dataset containing patient information with the following metadata:

...

Metadata field / Variable

Description of the field

Value range

Dataset Name

The name of the dataset.

Text

Date of Upload

The date on which the dataset was uploaded

Date values in the format MM/DD/YYYY

Keywords

Terms that describe the main topics of the dataset

Text

Creator

The person or organisation that created the dataset

Text, in our example title and last name

Description

A brief summary of the dataset

Text

Format

A file format of the dataset

Text, in our example a short string indicating the file format

Source

The origin of the dataset

Text, in our case the name of the institution

Rights

The usage rights or licence of the dataset

Text

b) In case you are aiming at collecting to collect FAIR (meta)data from the start:

  • Which data elements/variables are you planning to collect? For this, the competency questions (QCs) might provide some guidance.

  • If possible, determine the value range for each data element (e.g. for ‘biological sex at birth', values could be ‘male’, ‘female’; while for 'age’, the value range might be 0-110).

Step 2 - Check for an existing standard/codebook

...