Status | |
---|---|
|
...
|
...
|
Short description
`Generating a semantic model is often the most time-consuming step of data FAIRification. However, we expect the modelling effort to diminish as more and more models are made available for reuse over time, especially if such models are treated as FAIR digital objects themselves. Thus, it is important to first check whether a semantic model already exists for the data and the metadata that may be reused. For cases where no semantic model is available a new one needs to be generated.` (Generic)
The semantic model for a dataset describes the meaning of entities (data elements) and their relations in the dataset accurately, unambiguously, and in a computer-actionable way [GOFAIR_Process]. This model can then be applied to the non-FAIR data to transform it into linkable data, which can be queried. Given that generating a semantic model is often the most time-consuming part of the FAIRification process, it is important to first check whether a semantic model is already available for reuse. Creating such a model from scratch requires domain expertise on the dataset and expertise in semantic modeling.
...
Semantic modelling makes it possible that your data and metadata are machine-actionable in order to enable secondary use of your data. After performing this step, your data is being represented as FAIR digital objects (FDO). FDOs are digital objects identified by a Globally Unique, Persistent and Resolvable IDentifier (GUPRID) and described by metadata. This enables the transformed FAIR data set to be efficiently incorporated in other systems, analysis workflows, and unforeseen future applications.
Expertise requirements for this step
Experts that may need to be involved, as described in Metroline Step: Build the Team, include:
Semantic data modelling specialist: creates a new (meta)data model or applies an existing one, ensures that the semantic representation correctly represents the domain knowledge.
Domain expert: make sure that the exact meaning of the data is understood by the modeler.
[BEAT-COVID paper]
We developed ontological models for data record in collaboration with data collectors, data managers, data
analysts and medical doctors.
How to
(I) Reusing a semantic (meta)data model
...
Info |
---|
If you would like to include your dataset in the National Health Data Catalogue, your metadata needs to use Health-RI’s Core Metadata Schema. For more information about this and how to apply it, please refer to section 2. Metadata mapping in documentation. |
For metadata, semantic models describing generic items are available to be reused, e.g., DCAT to describe data set description. Domain-specific items should be decided by each individual self-identified domain, and need thereafter to be described in a semantic metadata model. [Generic]
...
list the main concepts (classes) of the data elements to be FAIRified;
what are the relationships between the data elements.
It is important that both the data representation (format) and the meaning of the data elements (the data semantics) are clear and unambiguous (see Analyse data semantics).
To help you understand what you would like to include in your model, you can start by creating a list of questions (competency queries). These can serve as a guide to identify the most relevant (meta)data elements to model.
...
...
...
Step 2: Search for ontology terms
...
Search engine | Short description | |
---|---|---|
BioPortal is a repository of biomedical ontologies. | ||
The OLS (by EMBL-EBI) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. You can browse the ontologies through the website as well as programmatically via the OLS API. More info here. | ||
BioPortal is a repository of biomedical ontologiesThe Open Biological and Biomedical Ontology (OBO) Foundry | Develops interoperable ontologies for biomedical sciences. Participants follow and contribute to the development of a set of principles to ensure that ontologies are logically well-formed and scientifically accurate. More info here. | |
The Basic Register of Thesauri, Ontologies & Classifications (BARTOC) is a database of Knowledge Organization Systems and KOS related registries with the goal to list as many Knowledge Organization Systems as possible at one place. More info here. | ||
Ontobee is a web-based linked data server and browser specifically designed for ontology terms. It supports ontology visualization, query, and development, provides a web interface for displaying the details and its hierarchy of a specific ontology term. More info here. |
...
...
The Open Biological and Biomedical Ontology (OBO) Foundry
...
Evidence and Conclusion Ontology (ECO)
Browser for ontologies for agricultural science based on NBCO BioPortal. |
Ontologies for different purposes can also be found in the FAIR cookbook, as well as on this page.
When choosing an ontology, several selection criteria might apply (see also from FAIR cookbook):
Update activity: Is it well maintained, i.e. frequent release, term requests handling, versioning and deprecation policies clarified?
Is it well documented? There should be enough metadata for each class in the artefact and enough metadata about the artefact itself
Usability license: What license and terms of use does it mandate?
Does the ontology contain a good class and property structure (this generally facilitates data integration)
What format does it come in?
Are there stable persistent resolvable identifiers for all terms?
Usage statistics: Who use it and what resources are being annotated with it?
Is a general ontological framework used (such as OBO Foundry).Exclusion criteria:
Absent licence or terms of use (indicator of usability)
Restrictive licences or terms of use with restrictions on redistribution and reuse
Absence of term definitions
Absence of sufficient class metadata (indicator of quality)
Absence of sustainability indicators (absence of funding records)
Inclusion criteria:
Scope and coverage meets the requirements of the concept identified
Unique URI, textual definition and IDs for each term
Resource releases are versioned
Size of resource (indicator of coverage)
Number of classes and subclasses (indicator of depth)
Number of terms having definitions and synonyms (indicator of richness)
Presence of a help desk and contact point (indicator of community support)
Presence of term submission tracker/issue tracker (indicator of resource agility and capability to grow upon request)
Potential integrative nature of the resource (as indicator of translational application potential)
Licensing information available (as indicator of freedom to use)
Use of a top level ontology (as indicator of a resource built for generic use)
Pragmatism (as indicator of actual, current real life practice)
Possibility of collaborating: the resource accepts complaints/remarks that aim to fix or improve the terminology, while the resource organisation commits to fix or improve the terminology in brief delays (one month after receipt?)
Finding the right ontology might be time-consuming and require thorough searching and some practice, since the first ontology provided by a search tool might not always be the best fit. It may be difficult to decide which term from which ontology should be used, i.e., to match the detail in domain specific ontologies with the detail that is needed to describe data elements correctly. Terms used in human narrative do not always match directly with the ontological representation of the term.
...
Finally, combine the conceptual model and the ontology terms to create the detailed semantic data model. This model distinguishes between the data items (instances and their values) and their types (classes), is an exact representation of the data and exposes the meaning of the data in machine-readable terms.
...
Repeat this step until no great errors occur any more in light of the competency questions.
[Optional] Step 5: Evaluation of semantic (meta)data models
To verify the semantic model, competency questions (CQ) can be used. CQs are an efficient way of of testing models, since they are based on real questions. CQs are evaluated by means of the query used to answer them. In other words, if it is possible to write a query that returns proper answers to the question, then the CQs is validated.
In the BEAT-COVID project, the ontological models were evaluated using competency questions that are based on realistic questions posed by data model users which are proposed as means to verify the scope (e.g.,what is relevant to solve the challenges) and the relationships between concepts (e.g., check for missing or redundant relationships). A preliminary set of CQs from meetings with domain experts is available on Github: https://github.com/LUMC-BioSemantics/beat-covid/tree/master/fair-data-model/cytokine/competency-questions
...
Expertise requirements for this step
Experts that may need to be involved, as described in Metroline Step: Build the Team, include:
Semantic data modelling specialist: creates a new (meta)data model or applies an existing one, ensures that the semantic representation correctly represents the domain knowledge.
Domain expert: make sure that the exact meaning of the data is understood by the modeler.
In the BEAT-COVID project, they developed ontological models for data record in collaboration with data collectors, data managers, data analysts and medical doctors [BEAT-COVID paper].
Practical examples from the community
This section should show the step applied in a real project. Links to demonstrator projects.
...
[BEAT-COVID project] https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-022-00263-7
Authors / Contributors
Experts whom you can contact for further information
Tools and resources on this page
Add the tools and resources mentioned on this page. This should be a list of usable content and does not include textual resources such as journal references.
Training
Relevant training will be added in the future if available.
Suggestions
Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.