...
In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model, as well as for data collection via, for example, electronic case report forms (eCRFs).
To understand 'the semantics', different aspects of the data elements/variables should be analysed.
The definition/description of data elements. For example, a variable called “sex” could refer to “Biological sex” or “Administrative gender”.
Values for choices. For example, in system A, sex allows for male and female, while in system B, sex also allows for intersex. Such difference reflects the gap of their semantics.
Relationship between data elements. For example, ‘sex’ the “sex” variable is one attribute of ‘patient’ “patient” profile, which may imply that the semantics of this ‘sex’ “sex” variable is ‘sex “sex of patient’patient”.
The outcome of this step should be a set of data elements (variables) with clear and unambiguous semantics (a codebook), which reflect the information you want to collect or share. Be aware that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in Create or reuse a semantic (meta)data model.
[SdR, how about:]
Understanding and clearly defining the meaning (semantics) of (meta)data is an important step for creating the semantic model, as well as for data collection via, for example, electronic case report forms (eCRFs). In this step, the aim is to ensure you gain a clear and unambiguous understanding of the (meta)data. The step provides guidance for both existing data and data that must still be collected.
To illustrate the issue, consider the example where you receive a dataset with a variable called “sex”. Without clearly defined semantics, it is unclear whether this means “biological sex at birth”, “phenotypic sex”, or “administrative gender”. This must be resolved before you can start with the semantic (meta)data model.
Thus, the outcome of this step is a set of data elements (variables) with clear and unambiguous semantics, known as a codebook. Note that finding machine-actionable items from ontologies for the data elements is not yet part of this step, but is described in Create or reuse a semantic (meta)data model.
...
While performing this step, keep your
...
For analysing data semantics:
For analysing metadata semantics:
For easier understanding, we will follow the example dataset containing patient information with the following metadata:
...
Metadata field / Variable | Description of the field | Value range |
---|---|---|
Dataset Name | The name of the dataset. | Text |
Date of Upload | The date on which the dataset was uploaded | Date values in the format MM/DD/YYYY |
Keywords | Terms that describe the main topics of the dataset | Text |
Creator | The person or organisation that created the dataset | Text, in our example title and last name |
Description | A brief summary of the dataset | Text |
Format | A file format of the dataset | Text, in our example a short string indicating the file format |
Source | The origin of the dataset | Text, in our case the name of the institution |
Rights | The usage rights or licence of the dataset | Text |
b) In case you are aiming at collecting to collect FAIR (meta)data from the start:
Which data elements/variables are you planning to collect? For this, the competency questions (QCs) might provide some guidance.
If possible, determine the value range for each data element (e.g. for ‘biological sex at birth', values could be ‘male’, ‘female’; while for 'age’, the value range might be 0-110).
Step 2 - Check for an existing standard/codebook
...