...
While performing this step, keep your
Let’s say we have a dataset containing patient information with the following metadata:
Metadata Field | Value |
---|---|
Dataset Name | Health Data |
Date of Upload | 01/02/2023 |
Keywords | BP, HR, Conditions |
Creator | Dr. Smith |
Description | Patient health data including BP and HR |
Format | CSV |
Source | Hospital A |
Rights | Open |
Step 1
Compile all the information of data elements, data values, and data structure. Examine the data in whatever format and structure it is available. This step helps to identify inconsistencies, ambiguities, and errors in the data.
In case you are FAIRifying existing/already collected data, locate all relevant sources in which the data is stored. Compile information about the following:
Which variables are present in the data (i.e. in the eCRFs)?
What are the value ranges for each variable?
In our example, we are working on FAIRifying an already existing metadata of a dataset. Let’s compile and examine the information we have.
Variables and Ranges of our metadata are as follows:
Dataset Name: The name of the dataset. Range: Text.
Date of Upload: The date on which the dataset was uploaded. Range: Date values in the format mm/dd/yyyy.
Keywords: Terms that describe the main topics of the dataset. Range: Text.
Creator: The person or organisation that created the dataset. Range: Text, in our example title and last name.
Desription: A brief summary of the dataset. Range: Text.
Format: A file format of the dataset. Range: Text, in our example a short string indicating the file format.
Source: The origin of the dataset. Range: Text, in our case the name of the institution.
Rights: The usage rights or licence of the dataset. Range: Text.
In case you are aiming at collecting FAIR data from the start:
Which data elements/variables are you planning to collect? For this, the driving user’s question might provide some guidance.
If possible, determine the value range for each data element (e.g. for ‘biological sex at birth', values could be ‘male’, ‘female’; while for 'age’, the value range might be 0-110)
...