Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status
colour

...

Blue
titlestatus:

...

On Hold
On 17-9-2024 it was decided to put this page on hold and focus on describing the petal process first. When that part is finished, parts of the information (see e.g. step 4) will be generalised for this page.

Short description 

`Generating a semantic model is often the most time-consuming step of data FAIRification. However, we expect the modelling effort to diminish as more and more models are made available for reuse over time, especially if such models are treated as FAIR digital objects themselves. Thus, it is important to first check whether a semantic model already exists for the data and the metadata that may be reused. For cases where no semantic model is available a new one needs to be generated.` (Generic)

...

Semantic modelling makes it possible that your data and metadata are machine-actionable in order to enable secondary use of your data. After performing this step, your data is being represented as FAIR digital objects (FDO). FDOs are digital objects identified by a Globally Unique, Persistent and Resolvable IDentifier (GUPRID) and described by metadata. This enables the transformed FAIR data set to be efficiently incorporated in other systems, analysis workflows, and unforeseen future applications.

Expertise requirements for this step 

Experts that may need to be involved, as described in Metroline Step: Build the Team, include:

  • Semantic data modelling specialist: creates a new (meta)data model or applies an existing one, ensures that the semantic representation correctly represents the domain knowledge.

  • Domain expert: make sure that the exact meaning of the data is understood by the modeler.

In the BEAT-COVID project, they developed ontological models for data record in collaboration with data collectors, data managers, data analysts and medical doctors [BEAT-COVID paper].

How to 

(I) Reusing a semantic (meta)data model

...

  • list the main concepts (classes) of the data elements to be FAIRified;

  • what are the relationships between the data elements.

It is important that both the data representation (format) and the meaning of the data elements (the data semantics) are clear and unambiguous (see Analyse data semantics).

To help you understand what you would like to include in your model, you can start by creating a list of questions (competency queries). These can serve as a guide to identify the most relevant (meta)data elements to model.

...

...

...

https://github.com/LUMC-BioSemantics/beat-covid/blob/master/fair-data-model/brainstorming/docs/brainstorming_models.pptx

Step 2: Search for ontology terms

...

Search engine

Short description

BioPortal

BioPortal is a repository of biomedical ontologies.

Ontology Lookup Service (OLS)

The OLS (by EMBL-EBI) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. You can browse the ontologies through the website as well as programmatically via the OLS API. More info here.

BioPortal

BioPortal is a repository of biomedical ontologies.The Open Biological and Biomedical Ontology (OBO) Foundry

Develops interoperable ontologies for biomedical sciences. Participants follow and contribute to the development of a set of principles to ensure that ontologies are logically well-formed and scientifically accurate. More info here.

BARTOC

The Basic Register of Thesauri, Ontologies & Classifications (BARTOC) is a database of Knowledge Organization Systems and KOS related registries with the goal to list as many Knowledge Organization Systems as possible at one place. More info here.

Ontobee

Ontobee is a web-based linked data server and browser specifically designed for ontology terms. It supports ontology visualization, query, and development, provides a web interface for displaying the details and its hierarchy of a specific ontology term. More info here.

...

...

The Open Biological and Biomedical Ontology (OBO) Foundry

...

Evidence and Conclusion Ontology (ECO)

Browser for ontologies for agricultural science based on NBCO BioPortal.

Ontologies for different purposes can also be found in the FAIR cookbook, as well as on this page.

When choosing an ontology, several selection criteria might apply (see also from FAIR cookbook):

  • Update activity: Is it well maintained, i.e. frequent release, term requests handling, versioning and deprecation policies clarified?

  • Is it well documented? There should be enough metadata for each class in the artefact and enough metadata about the artefact itself

  • Usability license: What license and terms of use does it mandate?

  • Does the ontology contain a good class and property structure (this generally facilitates data integration)

  • What format does it come in?

  • Are there stable persistent resolvable identifiers for all terms?

  • Usage statistics: Who use it and what resources are being annotated with it?

  • Is a general ontological framework used (such as OBO Foundry).Exclusion criteria:

    • Absent licence or terms of use (indicator of usability)

    • Restrictive licences or terms of use with restrictions on redistribution and reuse

    • Absence of term definitions

    • Absence of sufficient class metadata (indicator of quality)

    • Absence of sustainability indicators (absence of funding records)

  • Inclusion criteria:

    • Scope and coverage meets the requirements of the concept identified

    • Unique URI, textual definition and IDs for each term

    • Resource releases are versioned

    • Size of resource (indicator of coverage)

    • Number of classes and subclasses (indicator of depth)

    • Number of terms having definitions and synonyms (indicator of richness)

    • Presence of a help desk and contact point (indicator of community support)

    • Presence of term submission tracker/issue tracker (indicator of resource agility and capability to grow upon request)

    • Potential integrative nature of the resource (as indicator of translational application potential)

    • Licensing information available (as indicator of freedom to use)

    • Use of a top level ontology (as indicator of a resource built for generic use)

    • Pragmatism (as indicator of actual, current real life practice)

    • Possibility of collaborating: the resource accepts complaints/remarks that aim to fix or improve the terminology, while the resource organisation commits to fix or improve the terminology in brief delays (one month after receipt?)

Finding the right ontology might be time-consuming and require thorough searching and some practice, since the first ontology provided by a search tool might not always be the best fit. It may be difficult to decide which term from which ontology should be used, i.e., to match the detail in domain specific ontologies with the detail that is needed to describe data elements correctly. Terms used in human narrative do not always match directly with the ontological representation of the term.

...

Finally, combine the conceptual model and the ontology terms to create the detailed semantic data model. This model distinguishes between the data items (instances and their values) and their types (classes), is an exact representation of the data and exposes the meaning of the data in machine-readable terms. 

Screenshot 2024-06-04 at 14.14.37.pngImage Added

ontological_model-20240429-143058.png

https://github.com/LUMC-BioSemantics/beat-covid/blob/master/fair-data-model/cytokine/model-triples/ontological_model.png

...

Repeat this step until no great errors occur any more in light of the competency questions.

[Optional] Step 5: Evaluation of semantic (meta)data models

To verify the semantic model, competency questions (CQ) can be used. CQs are an efficient way of of testing models, since they are based on real questions. CQs are evaluated by means of the query used to answer them. In other words, if it is possible to write a query that returns proper answers to the question, then the CQs is validated.

In the BEAT-COVID project, the ontological models were evaluated using competency questions that are based on realistic questions posed by data model users which are proposed as means to verify the scope (e.g.,what is relevant to solve the challenges) and the relationships between concepts (e.g., check for missing or redundant relationships). A preliminary set of CQs from meetings with domain experts is available on Github: https://github.com/LUMC-BioSemantics/beat-covid/tree/master/fair-data-model/cytokine/competency-questions

...

Expertise requirements for this step 

Experts that may need to be involved, as described in Metroline Step: Build the Team, include:

  • Semantic data modelling specialist: creates a new (meta)data model or applies an existing one, ensures that the semantic representation correctly represents the domain knowledge.

  • Domain expert: make sure that the exact meaning of the data is understood by the modeler.

In the BEAT-COVID project, they developed ontological models for data record in collaboration with data collectors, data managers, data analysts and medical doctors [BEAT-COVID paper].

Practical examples from the community

This section should show the step applied in a real project. Links to demonstrator projects. 

...

[BEAT-COVID project] https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-022-00263-7

Authors / Contributors 

Experts whom you can contact for further information 

Tools and resources on this page

Add the tools and resources mentioned on this page. This should be a list of usable content and does not include textual resources such as journal references.

Training

Relevant training will be added in the future if available.

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.