Step 4. Conceptualization and semantic modeling
status: in development
Short description
With the inventory of terms and definitions at hand, borrowed as much as possible from widely used ontologies, the task ahead involves conceptualisation, which encompasses the organization of related metadata fields (properties or attributes) into groups (classes), how these classes relate to each other and relate to the Health-RI metadata schema. In a UML diagram the metadata fields, groups and relations can be visualized.
The next step is to turn this conceptual model into a semantic model which is a more formal representation leveraging ontologies (such as DCAT, PROV-O, FOAF) to unambiguously define the concepts - the metadata fields, their grouping and relations between the groupings - ensuring a shared understanding and reasoning across systems, allowing for interoperability and automated processing. For all concepts Unique Resource Identifiers (URIs) are used, and usually the model is expressed in RDF (Resource Description Framework), a W3C recommendation for (meta)data interchange on the web. If a concept has already been modelled and defined within a certain community, we should adopt this existing definition rather than creating a new one. In certain situations, it may be necessary to expand upon the definition of an existing concept.
This step requires substantial expertise in semantic modeling and usually requires many modeling sessions where domain and modeling experts have to work intensively together to grasp and model the semantics in a correct way. It is not within scope of this document to treat the (ontology driven) modeling process comprehensively; our intention is to give some general pointers and considerations.
Besides the interoperability and shared understanding aspect of semantic models, we’d like to highlight another aspect that is relevant in the context of search functionality of a catalog. If the catalog knows that "heart disease" is a type of "cardiovascular condition", reasoning helps it to find datasets related to "heart disease" even if you only searched for "cardiovascular conditions." Reasoning can play a crucial role in enhancing search functionality in a catalogue by enabling rich inferences based on the underlying semantic models.
Deliverables
Deliverable | Description |
---|---|
Semantic model | UML-diagram with defined classes, properties, namespaces, and (type and cardinalities of the) relations. |
List of metadata fields definitions used in the model | A detailed list of metadata elements including definitions, attributes, and relationships. |
Modeling decision log | A documented record of decisions made throughout the development of the model. |
How
1. Work with one or more example datasets
Take a dataset from your domain for which you do the metadata modeling as an example. Together with the scope statement (step 2) this helps to stay focussed while modeling. It also helps modeling experts to understand the dataset (context) better as they might lack domain expertise.
2. Organize modeling sessions
Arriving at a semantic model asks for thorough, time-consuming discussions and understanding about the meaning (semantics) of the concepts and context in question. To consider for such modeling sessions:
Make sure you have both domain experts and modeling experts at the table
Provide everyone with (access to) the relevant prior information (e.g. the results from previous steps like the inventory list, scope statement, example dataset)
Have a whiteboard (either a physical one or a virtual one like draw.io or MS Whiteboard) present for quickly sketching diagrams and relations
Take enough time. Usually one hour is too short. Plan several sessions ahead, ideally without too much time in between
Keep notes, record the meeting (if online/hybrid) and log decisions
Think of work formats, depending on the group size, borrowing from design techniques like, solution sketching (see also here); dividing the group into subgroups or individuals to work independently; instead of discussing sketches directly in the whole group, have people in the group individually write down questions and ideas per sketch, etc.
3. Create a conceptual model
Create a high-level conceptual model (preferably in the form of a UML diagram) that represents the domain’s key concepts and relationships. Take the Health-RI core/health metadata schema as a basis and
Tooling: draw.io, Visual Paradigm, Miro, Lucidchart, Astah, Excalidraw
4. Create a semantic model
Describe
Tooling: (Web)Protégé, Metaphactory, TopBraid EDG, PoolParty
Considerations
Rephrase this from the Generic FAIRification process article: We have found that making optimal choices, demands good searching skills and experience. For instance, it is generally insufficient to just choose the first ontology in the list provided by ontology search tools by definition. Instead one should also check the usability license, usage statistics, update activity, whether the ontology contains a good class and property structure (which generally facilitates data integration), and whether a general ontological framework is used (such as OBO Foundry [15]). Nevertheless, it may be very difficult to decide which term from which ontology should be used, i.e., to match the detail in domain specific ontologies with the detail that is needed to describe data elements correctly. Terms used in human narrative do not always match directly with the ontological representation of the term. If the search is unsuccessful, new ontology terms could be defined and added to existing ontologies or new ontologies could be developed. This is however a time-consuming process that should be undertaken with a team of experts from both the domain of the study as well as in consultation with ontology experts.
Criteria see https://faircookbook.elixir-europe.org/content/recipes/interoperability/selecting-ontologies.html#selecting-terminologies (used by domain, DCAT-compatible, license issues, mapping to other ontologies, maintenance, Logical Consistency and Reasoning Support, etc.)
HRI hub involvement in this step
In this step Health-RI should be consulted.
Further reading
FAIR Cookbook Selecting terminologies and ontologies
Book The What and How of Modelling Information and Knowledge
University of Twente course Ontology-Driven Conceptual Modeling
University of Twente course Linked Data and Semantic Web
Chapter What is an ontology
Book The Design Sprint