Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Purpose of this document

This document aims to describe describes the classes identified in the Plateau 1 Core Metadata Schema identified classes and their respective properties as well as provide , and provides usage notes to aid facilitate implementation. This document does not aim to describe the national catalogue and its onboarding process, nor does it include the future extensions of the metadata schema (leaves/petals), to see other . Other versions and further documentation go to can be found at https://github.com/Health-RI/health-ri-metadata/.

Introduction

Context

To In order to find and reuse information scattered across various different sources, the research community in the Netherlands has agreed to index their its resources in one a national catalogue (add link to Portal portal documentation Marianne Knoop Pathuis - Baarda ). The national catalogue , in its turn , aims to be indexed in international catalogues. One of the necessary requirements conditions to achieve such ambition this goal is to have a common agreement defining the bare minimal items on the minimum elements needed to search, find and reuse such resources. Such items These elements and their properties are components of the Core Metadata Schema.

...

The Core Metadata Schema is a formal shared conceptualisation of the requirements to find and reuse information across Health-RI nodes via the national catalogue. It depicts represents a set of minimal elements for describing any each resource (including dataset) with a generic common metadata. This scope matches is consistent with the functionality offered provided in the Plateau 1 National Catalogue release package (Marianne Knoop Pathuis - Baarda add link to documentation on plateau 1 and portal development releases).

The core model can be further be extended and get specialised to represent reflect domain-specific requirements (domains can may include Omicsomics, imaging, and etecetraetc). Therefore, we expect to land with more versions wich will additional versions to be released in accordance according to the requirements of Plateaus 2 and furtherbeyond. All versions will be released published via https://github.com/Health-RI/health-ri-metadata/.

This version of the Core metadata schema core incorporates includes DCAT v3 and some selected DCAT-AP mandatory classes and their definitions. The main most important entities are those that form the core of the DCAT application profile. DCAT-AP is a DCAT application profile for sharing the exchange of information about Catalogues containing Datasets and Data Services descriptions catalogues of datasets and description of data services in Europe. It is meant to be reviewed by the community of nodes that make up the Health-RI ecosystem. The final release of this version will be a requirement prerequisite for resources onboarding to be included in the national catalogue (Marianne Knoop Pathuis - Baarda add link) and benefiting of benefit from its offered functionalityfeatures. The how How to implement this model and connect to the national catalogue will be is described in the onboarding documentation, and . The offered features of the national catalogue offered functionality is are defined in the Plateau 1 (Marianne Knoop Pathuis - Baarda add link to Plateau 1 doc).

Notes on Alignment

To define create the current core metadata schema, we considered examined existing metadata from the COVID-19 national portal, provided metadata schema from provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals in across Europe and abroad beyond (e.g., W3C, DCAT, DCAT-AP). We then initiated mapping their classes and properties with the help of Then, with assistance from metadata specialists from the hub and nodes. The mappings are described and open for comments , we began mapping their classes and properties. The mappings are detailed in the mapping table. Finally, after conceptualisation, we decided to re-use of dcat and dcat-ap reuse DCAT and DCAT-AP for its implementation. This is mainly because dcat the DCAT application profile was covering covers the collected identified requirements.

rdfs:Resource and other Resource Type

...

Within DCAT and DCAT-AP, the term "resource" usually refers to Any Thing generally encompasses all objects that can be described in using RDF. However, but there are specific classes categories and properties to denote various attributes used to indicate the different types of resources:

  • dcat:Dataset is a type of rdfs:Resource representing a collection of data

  • dcat:Distribution is a type of rdfs:Resource representing an available form or representation of a dataset.

  • dcat:Catalog is a type of rdfs:Resource representing a collection of datasets.

  • dcat:DataService , ( introduced in DCAT version 2) , is a type of Resource representing a service through which data can be accessedfor accessing data.

  • foaf:Project is a type of rdfs:Resource representing project level information

In dcat DCAT and dcat-ap Resource is The DCAT vocabulary is centered around datasets. However, there might be a need to represent a more diverse range of resources DCAT-AP, the vocabulary is focused on datasets. Nonetheless, there may arise a requirement to portray a wider variety of resources that are specific to certain domains, such as like biobanks or patient registries. HereIn such cases, we suggest some propose potential scenarios for adapting modifying or expanding augmenting DCAT to better represent accurately depict your resource type.

  • Utilize Use dcat:Resource Directly directly: If the resource asset you 're are dealing with doesn't align perfectly is not in line with the definition of a dcat:Dataset definition, you can opt for use the more general broader term dcat:Resource. This term allows you to represent virtually almost any type of asset. While However, this approach provides greater flexibility, it may not offer precise clarity be completely clear for users who are trying to grasp understand the resource's essence . Over time, we can further of the asset. We can de define the resource asset type further with specific vocabularies over time.

  • Expand with Personalized Personalised Classes: Should If the there be is a requirement need to depict particular represent specific resources, such as biobanks or patient registries, consider enhancing it may be beneficial to supplement the foundational DCAT vocabulary with custom classes. For instance:

...

  • example:

:Collection a rdfs:Class ;

rdfs:subClassOf dcat:Resource .

...

and

:PatientRegistry a rdfs:Class ;

rdfs:subClassOf dcat:Dataset .

With such When creating custom classes, you must provide more specific metadata about it is essential to provide detailed metadata for each type of resource and make it clear for other users or systems to differentiate and understand the nuances between them (ask yourself how collection is different than dataset?). This will enable users and systems to distinguish between them and comprehend their subtle differences. For instance, consider the distinction between a collection and a dataset. Therefore, it is crucial to provide specific and unambiguous information to ensure complete understanding.

Feedback via Git Issues

Should If you want wish to extend expand the model e.g, such as with Resource, and define /or create a new concept register , please open an issue in Health-RI github RI’s GitHub repository https://github.com/Health-RI/health-ri-metadata/tree/master and explain provide a clear explanation for the extension assign it . Assign the issue to either denatahvildari’ Or or brunasv’. We together , and we will work with you to implement the new addition for in the next release.

Overview

An overview of the Metadata schema core is shown by presented in the UML diagram depicted below (Fig 1). The UML illustrates showcases the main primary classes (entities) and does not show , excluding the detailed definitions such as rdfs:label rdfs:comment. Each block represents denotes a class , and it contains comprises a list of its attributes of such class (properties). If a class is connected to another class by a closed arrow, it means this signifies that it inherits all properties from the other class (e.g. . For example, dcat:DatasetSeries inherits from dcat:Dataset which inherits from dcat:Resource). The other arrows, represent relations and contain the type of relation (e.g. , such as dcat:Dataset is connect connects to a dcat:DatasetSeries via the predicate dcat:inSeries) , and include the cardinality (e.g. , such as dcat:Dataset can be connected via dcat:inSeries to zero or more dcat:DatasetSeries).

Recommended Versus Mandatory

...