Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For example, in a dataset a variable to collect sex-related data might be called ‘sex’. If the semantics of such variable is are not provided or not analyzed, it would be unclear if it means ‘biological sex at birth', ‘phenotypic sex’, or ‘gender’. These issues have to be solved before you start with the semantic (meta)data model.

...

Metadata Field

Value

Dataset Name

Health Data

Date of Upload

01/02/2023

Keywords

BP, HR, Conditions

Creator

Dr. Smith

Description

Patient health data including BP and HR

Format

CSV

Source

Hospital A

Rights

Open

In the this example, we are working with existing metadata. According to the diagram, that’s why we will should start with Step 1 - Compile information.

Step 1 - Compile information

Compile all the information of data elements, data values, and data structure. Examine the data in whatever format and structure it is available. This step helps to identify inconsistencies, ambiguities, and errors in the data.

...

  • Which data elements/variables are you planning to collect? For this, the competency questions (QCs) might provide some guidance.

  • If possible, determine the value range for each data element (e.g. for ‘biological sex at birth', values could be ‘male’, ‘female’; while for 'age’, the value range might be 0-110)

Step 2 - Check for an existing standard/code book

a) For existing (meta)data: check if it comes with a code book or metadata standard. In case it does and it is clear, can use it for your (meta)data and are done with this step. If the codebook is not helpful, you should contact the owner of the data and get the semantics cleared up, so you don’t misinterpret the data. If you see you still need to do additional work in order to make the data clearer, follow the steps below.

...

You can find more about metadata standards and ontologies at the following link: https://howtofair.dk/links-additional-reading/#more-on-metadata-standards-and-ontologies-

Step 3 - Check data semantics

Check the data semantics. Is the meaning of the data elements clear and unambiguous? For data elements with ambiguous meaning, try to improve their definition. For this, it might help to examine the value range of a variable to find out if next to the intended value range, other values could be filled in, too.

...

Metadata Field

Value

Issue

Suggested Value

Suggested description

Dataset Name

Health Data

Generic and not descriptive.

Patient Health Records 2023

The name of the dataset.

Date of Upload

01/02/2023

Ambiguous format
(MM/DD/YYYY or
DD/MM/YYYY).

2023-01-02

Date when the dataset was uploaded, in ISO 8601 format (YYY-MM-DD).

Keywords

BP, HR, Conditions

Abbreviations used without context.

Blood Pressure, Heart Rate, Hypertension

Keywords describing the main topics covered by the dataset.

Creator

Dr. Smith

Generic name without additional identifying information.

Dr. John Smith, Hospital A

ORCID: 0001-0002-3456-7890

Full name and affiliation of the dataset creator, as well as ORCID.

Description

Patient health data including BP and HR

Lacks detail.

Detailed patient health records including measurements of blood pressure (BP) and heart rate (HR), along with diagnosed medical conditions and prescribed medications.

Extended description providing context and details about the dataset.

Format

CSV

Broad category, can be more detailed.

CSV, version 1.0

Data format and version.

Source

Hospital A

Lacks detail, too generic.

Hospital A, Department of Cardiology

Specific department and institution where the data was sourced.

Rights

Open

Too broad.

CC BY 4.0

Licensing terms specifying the rights for data usage.

Step 4 - Check relationships

Compile information about the relationships between data elements. For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.

...

In our metadata example above, the Creator (Dr. John Smith) is an employee of the Source (Hospital A, Department of Cardiology) of the dataset.

Step 5 - Check for FAIR features

In addition, check whether the data already contains FAIR features, such as persistent unique identifiers for data elements (for more information, see pre-FAIR assessment).

In our example above, the ORCID of the creator is a unique persistent identifier (F1) for a person.

After having performed the relevant parts of this Metroline step, proceed to the next: Metroline Step: Create or reuse a semantic (meta)data model

Expertise requirements for this step 

...