Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The outcome of this step should be a set of data elements (variables) with clear and unambiguous semantics (a codebook), which reflect the information you want to collect or share. Be aware that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in Create or reuse a semantic (meta)data model.

...

To see which steps are relevant for your (meta)data, please follow the diagram diagrams below.

For analysing data semantics:

AnalyseDataSemantic-Data.pngImage Added

For analysing metadata semantics:

AnalyseDataSemantic-Metadata_.pngImage Added

For easier understanding, we will follow the example dataset containing patient information with the following metadata:

Metadata Field

Value

Dataset Name

Health Data

Date of Upload

01/02/2023

Keywords

BP, HR, Conditions

Creator

Dr. Smith

Description

Patient health data including BP and HR

Format

CSV

Source

Hospital A

Rights

Open

...

In the example, we are working with an existing (meta)datametadata, that’s why we will start with Step 2 - Check for an existing standard/code book1 - Compile information.

Step 1 - Compile information

Compile all the information of data elements, data values, and data structure. Examine the data in whatever format and structure it is available. This step helps to identify inconsistencies, ambiguities, and errors in the data.

...

a) For existing (meta)data: Locate all relevant sources in which the (meta)data is stored. Compile information about the following:

  • Which variables are present in the (meta)data (i.e. in the eCRFs)?

  • What are the value ranges for each variable?

...

Metadata field / Variable

Description of the field

Value range

Dataset Name

The name of the dataset.

Text

Date of Upload

The date on which the dataset was uploaded

Date values in the format MM/DD/YYYY

Keywords

Terms that describe the main topics of the dataset

Text

Creator

The person or organisation that created the dataset

Text, in our example title and last name

Description

A brief summary of the dataset

Text

Format

A file format of the dataset

Text, in our example a short string indicating the file format

Source

The origin of the dataset

Text, in our case the name of the institution

Rights

The usage rights or licence of the dataset

Text

b) In case you are aiming at collecting FAIR (meta)data from the start:

  • Which data elements/variables are you planning to collect? For this, the driving user’s question competency questions (QCs) might provide some guidance.

  • If possible, determine the value range for each data element (e.g. for ‘biological sex at birth', values could be ‘male’, ‘female’; while for 'age’, the value range might be 0-110)

Step 2 - Check for an existing standard/code book

a) For existing (meta)data: check if it comes with a code book . If yes: does it help? If yes, you’re done, if no: Contact or metadata standard. In case it does and it is clear, can use it for your (meta)data and are done with this step. If the codebook is not helpful, you should contact the owner of the data and get the semantics cleared up, so you don’t misinterpret the data. If you see you still need to do additional work in order to make the data clearer, follow the steps below.

b) For new data: check if there is a code book or metadata standard you can use. If yes: , you can use that, if no: follow the steps (paragraph below)Check if there is an existing data standard or code book that you can reuse. If there is, use it, otherwise follow the next steps below.

If you find a codebook or metadata standard that fits partially, use it for the elements that are included and follow the steps below for the others.

Health-RI, together with domain representatives, will be aiming to develop domain-specific national data standards in the future.

You can find more about metadata standards and ontologies at the following link: https://howtofair.dk/links-additional-reading/#more-on-metadata-standards-and-ontologies-

Step 3 - Check data semantics

...

In the example of collecting data on a patient’s 'sex', it might be unclear if it means ‘biological sex at birth' or ‘gender’. In another example, 'age' of a subject can be expressed in years, but in some cases (i.e. studies with small children) could also be expressed in months. It should therefore be clearly stated of if age should be captured in years or months.

In the below spreadsheet we can see what the issues are with our current metadata and suggested improvements in order to make the meaning of them clearer.

This recipe in the FAIR cookbook gives some additional guidance on specifying the semantics of elements of your data.

Metadata Field

Value

Issue

Suggested Value

Suggested description

Dataset Name

Health Data

Generic and not descriptive.

Patient Health Records 2023

The name of the dataset.

Date of Upload

01/02/2023

Ambiguous format
(MM/DD/YYYY or
DD/MM/YYYY).

2023-01-02

Date when the dataset was uploaded, in ISO 8601 format (YYY-MM-DD).

Keywords

BP, HR, Conditions

Abbreviations used without context.

Blood Pressure, Heart Rate, Hypertension

Keywords describing the main topics covered by the dataset.

Creator

Dr. Smith

Generic name without additional identifying information.

Dr. John Smith, Hospital A

ORCID: 0001-0002-3456-7890

Full name and affiliation of the dataset creator, as well as ORCID.

Description

Patient health data including BP and HR

Lacks detail.

Detailed patient health records including measurements of blood pressure (BP) and heart rate (HR), along with diagnosed medical conditions and prescribed medications.

Extended description providing context and details about the dataset.

Format

CSV

Broad category, can be more detailed.

CSV, version 1.0

Data format and version.

Source

Hospital A

Lacks detail, too generic.

Hospital A, Department of Cardiology

Specific department and institution where the data was sourced.

Rights

Open

Too broad.

CC BY 4.0

Licensing terms specifying the rights for data usage.

Step 3 4 - Check relationships

Compile information about the relationships between data elements. For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.

...

In our metadata example above, the Creator (Dr. John Smith) is an employee of the Source (Hospital A, Department of Cardiology) of the dataset.

Step 4 5 - Check for FAIR features

...

In our example above, the ORCID of the creator is a unique persistent identifier (F1) for a person.

Step 5 - Define/align common data elements [removed?]

Define or align common data elements (CDEs).

NOTE: we don’t define CDEs in this step, but we do need to include checking for already existing ones in other steps.

Common Data Elements (CDEs) are standardised, precisely defined question paired with specific allowable responses. These CDEs can be used systematically across different sites, studies, or clinical trials to ensure consistent data collection. CDEs give us a way to standardize and share precise and unambiguous definitions of the meaning of data independent of any data model or data set. [NIH]

For a new data collection: check if there is a set of CDEs in your domain that you can reuse (for example Set of common data elements for Rare Diseases Registration), otherwise newly define CDEs whose semantics are clear and unambiguous; for an existing data set, existing data elements can be aligned to CDEs.

For more information, please refer to the Metroline Step: Apply common data elements.

Expertise requirements for this step 

...