Short Description
In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model (https://health-ri.atlassian.net/wiki/spaces/FSD/pages/277839878/Metroline+Step+Create+or+reuse+a+semantic+meta+data+model ), as well as for data collection via e.g. eCRFs (Metroline Step: Design eCRF (data collection)) [De Novo].
To understand “semantics”, data values (i.e., meaning of data elements), data representation (format), and structure information (i.e., relationship between data elements) should be analysed.
The goal is a set of data elements with clear and unambiguous semantics, which reflect the information you want to collect or share.[FAIRopoly]
Why is this step important
Even though this step has no clearly defined deliverable, several of the following steps rely on being familiar with your data. For example, in order to create or reuse your semantic (meta)data model, it is important to understand the elements and structure of your existing data, or data to be collected. Furthermore, a good understanding of your data is closely connected to the FAIRification goals, since these can depend on the data elements.
Expertise requirements for this step
Experts that may need to be involved, as described in Metroline Step: Build the Team, include:
Data specialist: can help with understanding of data structure,
Domain specialist: can help with understanding of data elements.
How to
When analysing the data semantics of an existing data set or setting up a new data collection, you can consider the following:
What is the format in which the data is available? What is the structure of the data?
What are the relations between the data elements? For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.
Is the data representation (format) clear and unambiguous? What are the data types?
Is the meaning of the data elements (data semantics) clear and unambiguous?
For a new data collection: define common data elements (CDEs) whose semantics are clear and unambiguous; for an existing data set, existing data elements can be aligned to CDEs
In addition, check whether the data already contain FAIR features, such as persistent unique identifiers for data elements (for more information, see pre-FAIR assessment).
When performing this step, also keep your FAIRification goals in mind, since e.g., selecting a relevant subset of the data and defining driving user questions(s) are highly relying on being familiar with the data.
Practical Examples from the Community
This section should show the step applied in a real project. Links to demonstrator projects.
References & Further reading
[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y
[FAIRopoly] https://www.ejprarediseases.org/fairopoly/
[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification
[GOFAIR_Process] https://www.go-fair.org/fair-principles/fairification-process/
[CDE] https://cde.nlm.nih.gov/home