Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this step, the aim is to gain more insight into the existing data, or the data that you aim to collect. Clearly defining the meaning (semantics) of the data is an important step for creating the semantic model (step Yhttps://health-ri.atlassian.net/wiki/spaces/FSD/pages/277839878/Metroline+Step+Create+or+reuse+a+semantic+meta+data+model ), as well as for data collection via e.g. eCRFs (step Z(Metroline Step: Design eCRF (data collection)) [De Novo]. 

To understand “semantics”, data values (i.e., meaning of data elements), data representation (format), and structure information (i.e., hierarchy of that data in the underlying data modelrelationship between data elements) should be analysed.

The deliverable should be goalis a set of data elements whose semantics should be with clear and unambiguous to semantics, which reflect the information you want to be exchangedcollect or share.[FAIRopoly]

Why is this step important

When you completed complete this step, the data that exists / or will be collected should be unambiguously defined. Furthermore, for existing data, you should understand the dataset’s structure and know whether it already has FAIR features.  

...

  • Data specialist: can help with understanding of data structure,

  • Domain specialist: can help with understanding of data elements.

How to 

This section should help complete the step. It’s crucial that this is practical, doable and scalable. 

Depending on the type of step, this can, for example, be a reference to one or more (doable) recipes, or perhaps some form of checklist? The recipes/best-practices presented should be based on experts from the field.

[Generic] investigating the data in whatever form(s) it is available (specified in Step 1) and checking whether both the data representation (format) and the meaning of the data elements (the data semantics) are clear and unambiguous, and 2) checking whether the data already contain FAIR features, such as persistent unique identifiers for data elements [14] (FAIR principle F1 [1]) by e.g., using FAIRness assessment tooling [2,3, 4]. It is evident that this step is tightly connected with Step 1 since e.g., selecting a relevant subset of the data and defining driving user questions(s) are highly relying on being familiar with the data.

[GOFAIR] inspect the content of the data: Which concepts are representedWhen analysing the data semantics of an existing data set or setting up a new data collection, you can consider the following:

  • What is the format in which the data is available? What is the structure of the data?

  • What are the relations between the data elements?

...

  • For example, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.

In the EJP RD, an initial set of such minimal information has been determined by the Joint Research Centre and is described as Common Data Elements (CDEs). A new registry is advised to collect these CDEs while an existing registry should identify local data elements that semantically align with these CDEs. → define CDEs or aline with CDEs

...

  • Is the data representation (format) clear and unambiguous? What are the data types?

  • Is the meaning of the data elements (data semantics) clear and unambiguous?

  • For a new data collection: define common data elements (CDEs) whose semantics are clear and unambiguous; for an existing data set, existing data elements can be aligned to CDEs

In addition, check whether the data already contain FAIR features, such as persistent identifiers [Generic]. See step W unique identifiers for data elements (for more information about a , see pre-fair FAIR assessment).  If you’re setting up a new data collection,

When performing this step, also keep your FAIRification goals in mind, since e.g. a registry, the deliverable should be a set of data elements whose semantics are clear and unambiguously reflect the information you want to be exchanged [FAIRopoly]. This can be based on existing initiatives, such as Common Data Elements [CDE] – also see step X.

This should probably be a subpage so as not to have too much text on this page.  

References, if relevant, to FAIRCookbook, RDMKit, GOFAIR?  

Sub headers if relevant for specific domains?

Analyse the retrieved data: inspect the content of the data: Which concepts are represented? What is the structure of the data? What are the relations between the data elements? Different data distributions require different methods for identification and analysis. For instance, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc. [GO FAIR], selecting a relevant subset of the data and defining driving user questions(s) are highly relying on being familiar with the data.

Practical Examples from the Community 

...

[CDE] https://cde.nlm.nih.gov/home   

Authors / Contributors 

Experts whom you can contact for further information Ana Konrad; Hannah Neikes; Sander de Ridder