Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Short Description 

In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model, as well as for data collection via e.g. eCRFs

To understand “semantics”, data values (i.e., meaning of data elements), data representation (format), and structure information (i.e., relationship between data elements) should be analysed.

The goal is a set of data elements with clear and unambiguous semantics, which reflect the information you want to collect or share.

Why is this step important

Even though this step has no clearly defined deliverable, several of the steps that follow rely on being familiar with your data. For example, in order to create or reuse your semantic (meta)data model, it is important to understand the elements and structure of your existing data, or data to be collected. Furthermore, a good understanding of your data is closely connected to the FAIRification goals, since these can depend on the data elements.

Expertise requirements for this step 

Experts that may need to be involved, as described in Metroline Step: Build the Team, include:

  • Data specialist: can help with understanding of data structure,

  • Domain expert: can help with understanding of data elements.

How to 

When analyzing the data semantics of an existing data set or setting up a new data collection, consider the following:

  • Format and Structure: What is the format in which the data is available? What is the structure of the data?

  • Relations: What are the relations between the data elements? For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.

  • Data Representation: Is the data format clear and unambiguous? What are the data types?

  • Data Semantics: Is the meaning of the data elements clear and unambiguous?

For a new data collection: define common data elements (CDEs) whose semantics are clear and unambiguous; for an existing data set, existing data elements can be aligned to CDEs.

In addition, check whether the data already contains FAIR features, such as persistent unique identifiers for data elements (for more information, see pre-FAIR assessment).

While performing this step, keep your FAIRification goals in mind, since e.g., selecting a relevant subset of the data and defining driving user questions(s) depend on a thorough understanding of the data.

References & Further reading

[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y  

[FAIRopoly] https://www.ejprarediseases.org/fairopoly/  

[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification   

[GOFAIR_Process] https://www.go-fair.org/fair-principles/fairification-process/

[CDE] https://cde.nlm.nih.gov/home   

Authors / Contributors 

Ana Konrad; Hannah Neikes; Sander de Ridder

  • No labels