...
Step 2 - Check for an existing standard/code book
a) For existing (meta)data: check if it comes with a code book or metadata standard. In case it does and it is clear, you can use it for your (meta)data and are done with this step is done.
If the codebook is not helpful, you should contact the owner of the data and get the semantics cleared up, so you don’t misinterpret the data. If you see you still need to do additional work in order to make the data clearer, follow the steps below.
b) For new (meta)data: check if there is a code book or metadata standard you can use. If yesA domain expert (for data) or FAIR data steward/semantic expert (for metadata) can help you find out if and where a codebook or standard might be available.
In case there is a codebook or standard, you can use that, if no, follow the next stepsit. If there is no codebook or standard available, proceed to step 3.
If you find a codebook or metadata standard that fits partially, use it for the elements that are included and follow the steps below for the othersremaining elements.
Info |
---|
Health-RI, together with domain representatives, will be aiming to develop domain-specific national data standards in the future. |
You can find more about metadata standards and ontologies at the following link: https://howtofair.dk/links-additional-reading/#more-on-metadata-standards-and-ontologies-
...
Check the data semantics. Is the meaning of the data elements clear and unambiguous? For data elements with ambiguous meaning, try to improve their definition. For this, it might help to examine find out what is the intended value range of a variable to find out if next to the intended value range, other values could be filled in, too.- is the exact range known and is it clear enough?
In the example of collecting data on a patient’s 'sex', it might be unclear if it means ‘biological sex at birth' or ‘gender’. In another example, 'age' of a subject can be expressed in years, but in some cases (i.e. studies with small children) could also be expressed in months. It should therefore be clearly stated if the value range for age should be captured expressed in years or months.
In the below spreadsheet we can see what the issues are with our current metadata and suggested improvements in order to make the meaning of them clearer.
...
Metadata Field | Value | Issue | Suggested variable description | Suggested Value Range | Suggested descriptionValue | |
---|---|---|---|---|---|---|
Dataset Name | Health Data | Generic and not descriptive.Patient Health Records 2023 | The name of the dataset. | Text | Patient Health Records 2023 | |
Date of Upload | 01/02/2023 | Ambiguous format | 2023-01-02 | Date when the dataset was uploaded, in ISO 8601 format (YYYYYYY-MM-DD). | Date in ISO 8601 format (YYYY-MM-DD) | 2023-01-02 |
Keywords | BP, HR, Conditions | Abbreviations used without context. | Blood Pressure, Heart Rate, Hypertension | Keywords describing the main topics covered by the dataset. | Text | Blood Pressure, Heart Rate, Hypertension |
Creator | Dr. Smith | Generic name without additional identifying information. | Full name and affiliation of the dataset creator, as well as ORCID. | Text and ORCID identifier | Dr. John Smith, Hospital A ORCID: 0001-0002-3456-7890Full name and affiliation of the dataset creator, as well as ORCID. | |
Description | Patient health data including BP and HR | Lacks detail. | Extended description providing context and details about the dataset. | Text | Detailed patient health records including measurements of blood pressure (BP) and heart rate (HR), along with diagnosed medical conditions and prescribed medications. | Extended description providing context and details about the dataset. |
Format | CSV | Broad category, can be more detailed. | Data format and version. | Text | CSV, version 1.0Data format and version. | |
Source | Hospital A | Lacks detail, too generic. | Hospital A, Department of Cardiology | Specific department and institution where the data was sourced. | Text and ROR identifier | Hospital A, Department of Cardiology, https://ror.org/example |
Rights | Open | Too broad.CC BY 4.0 | Licensing terms specifying the rights for data usage. | URL to CC License |
Step 4 - Check relationships
...