Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Short Description 

‘… selecting a relevant subset of the data and defining driving user questions(s) are highly relying on being familiar with the data’ (Generic)

In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model, as well as for data collection via e.g. eCRFs

To understand “semantics”, data values (i.e., meaning of data elements), data representation (format), and structure information (i.e., relationship between data elements) should be analysed.

The goal is a set of data elements with clear and unambiguous semantics, which reflect the information you want to collect or share.

Why is this step important

Even though this step has no clearly defined deliverable, several of the steps that follow rely on being familiar with your data. For example, in order to create or reuse your semantic (meta)data model, it is important to understand the elements and structure of your existing data, or data to be collected. Furthermore, a good understanding of your data is closely connected to the FAIRification goals, since these can depend on the data elements.

Expertise requirements for this step 

Experts that may need to be involved, as described in Metroline Step: Build the Team, include:

  • Data specialist: can help with understanding of data structure,

  • Domain expert: can help with understanding of data elements.

How to

While performing this step, keep your FAIRification goals in mind, since e.g., selecting a relevant subset of the data and defining driving user questions(s) depend on a thorough understanding of the data.

Step 1

Check the data in whatever format and structure it is available.

Step 2

Check which data elements are present, and what their relation is. For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.

Step 3

Check the data semantics. Is the meaning of the data elements clear and unambiguous?

Step 4

Check whether the data representation is clear and unambiguous. Investigate which types of data are present.

Step 5

In addition, check whether the data already contains FAIR features, such as persistent unique identifiers for data elements (for more information, see pre-FAIR assessment).

Step 6

Define or align common data elements (CDEs). For a new data collection: define CDEs whose semantics are clear and unambiguous; for an existing data set, existing data elements can be aligned to CDEs.

References & Further reading

[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y  

[FAIRopoly] https://www.ejprarediseases.org/fairopoly/  

[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification   

[GOFAIR_Process] https://www.go-fair.org/fair-principles/fairification-process/

[CDE] https://cde.nlm.nih.gov/home   

Authors / Contributors 

Ana Konrad; Hannah Neikes; Sander de Ridder

  • No labels