...
Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Editors:
Bruna dos Santos Vieira
Dena Tahvidalri (until 29th Feb 2024)
Repository:
Latest published version:
Formal Models (shapes, rdf, skos,health-ri-metadata/Formalisation(shacl)/Core Metadata Module/coreRules.shapes.ttl at master · Health-RI/health-ri-metadata (github.com)
Purpose of this document
This document outlines the Plateau 1 Core Metadata Schema, detailing the classes and entities involved and offering implementation guidance (usage notenotes) for developers at regional nodes. It specifically addresses the schema's design and application but excludes discussion on the national cataloguecatalog, its onboarding process, and future schema expansions. Additional information and versions are available on GitHub.https://github.com/Health-RI/health-ri-metadata/.
...
Designed for a technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core schema.
Introduction
Context
In order to To find and reuse information scattered across different sources, the research community in the Netherlands has agreed to index its resources in a national catalogue catalog (add link to portal documentation Marianne Knoop Pathuis - Baarda Lucie Kulhankova ). The national catalogue catalog in turn aims to be indexed in international cataloguescatalogs. One of the necessary conditions to achieve this goal is a common agreement on the minimum elements needed to search, find and reuse such resources. These elements and their properties are components of the Core Metadata Schema.
...
The Core Metadata Schema is a formal shared conceptualisation of the requirements to find and reuse information across Health-RI nodes via the national cataloguecatalog. It represents a set of minimal elements for describing each resource (including dataset) with common metadata. This scope is consistent with the functionality provided in the Plateau 1 National catalogue catalog release package (Marianne Knoop Pathuis - Baarda add Lucie Kulhankova add link to documentation on plateau 1 and portal development releases).
The core model (see Diagram) can be further extended and specialised to reflect domain-specific requirements (domains may include omics, imaging, and etc). Therefore, we expect additional versions to be released according to the requirements of Plateaus 2 and beyond. All versions will be published via https://github.com/Health-RI/health-ri-metadata/.
The core model is being reviewed by the community of nodes that make up the Health-RI ecosystem (see Feedback section). The final release of this version will be a prerequisite for resources to be included in the national catalogue (Marianne Knoop Pathuis - Baarda add catalog (Lucie Kulhankova add Portal link) and benefit from its offered features. How to implement this model and connect to the national catalogue catalog is described in the onboarding documentation. The offered features of the national catalogue catalog are defined in the Plateau 1 (Marianne Knoop Pathuis - Baarda Lucie Kulhankova add link to Plateau 1 doc).
...
To create the current core metadata schema, we examined existing metadata from the COVID-19 national portal, metadata schema provided by Health-RI nodes (e.g., ABC metadata), and standards used in portals across Europe and beyond (e.g., W3C, DCAT, DCAT-AP). Then, with assistance from metadata specialists from the hub and nodes, we began mapping their classes and properties. The mappings are detailed in the mapping table. Finally, after conceptualisation, we decided to reuse DCAT and DCAT-AP for its implementation. This version of the Core metadata schema includes DCAT v3 and some selected DCAT-AP mandatory classes and their definitions. The most important entities are those that form the core of the DCAT application profile. DCAT-AP is a DCAT application profile for the exchange of information about catalogue catalog of datasets and description of data services in Europe. Hence, the Core metadata schema ensures compatibility with international catalogues catalogs which also use DCAT-AP. This is mainly because the DCAT application profile covers the identified requirements.
...
dcat:Dataset
is a type ofdcat:Resource
representing a collection of datadcat:Distribution
is a type ofdcat:Resourcee
representing an available form or representation of a dataset.dcat:catalogueCatalog
is a type ofdcat:Resource
representing a collection of datasets.dcat:DataService
, introduced in DCAT version 2, is a type of Resource representing a service for accessing data.foaf:Project
is a type ofdcat:Resource
representing project-level information
In DCAT and DCAT-AP, the vocabulary is focused on datasets. Nonetheless, there may arise a requirement to portray a wider variety of resources that are specific to certain domains, like biobanks or patient registries. In such cases, we propose potential scenarios for modifying or augmenting DCAT to accurately depict your resource type.
Use
dcat:Resource
directly: If the asset you are dealing with is not in line with thedcat:Dataset
definition, you can use the broader termdcat:Resource
. This term allows you to represent almost any type of asset. However, this approach may not be completely clear for users who are trying to understand the essence of the asset. We can de define the asset type further with specific vocabularies over time.Expand with Personalised Classes: If the there is a need to represent specific resources, such as biobanks or patient registries, it may be beneficial to supplement the foundational DCAT vocabulary with custom classes. For example:
...
If you wish to extend the model, such as with Resource, and/or create a new concept, please open an issue in Health-RI’s GitHub repository https://github.com/Health-RI/health-ri-metadata/tree/master and provide a clear explanation for the extension. Assign the issue to either ‘brunasv’ or ‘xiaofengleo’, and we will work with you to implement the addition in the next release.
...
Recommended Versus Mandatory
In accordance with Following the DCAT-AP specification, we distinguish between recommended and mandatory components (classes and properties). The subsequent section clarify clarifies these components under ‘mandatory’ and ‘recommended’ headings. A third category named 'Optional' may be introduced in the future.
...
Mandatory class: a receiver of data MUST be able to process information about instances of the class; a sender of data MUST provide information about instances of the class
Recommended class: a sender of data SHOULD provide information about instances of the class; a sender of data MUST provide information about instances of the class , if such information is available; a receiver of data MUST be able to process information about instances of the class.
Optional class: a receiver MUST be able to process information about instances of the class; a sender MAY provide the information but is not obliged to do so.
Mandatory property: a receiver MUST be able to process the information for that property; a sender MUST provide the information for that property.
Recommended property: a receiver MUST be able to process the information for that property; a sender SHOULD provide the information for that property if it is available.
Optional property: a receiver MUST be able to process the information for that property; a sender MAY provide the information for that property but is not obliged to do so.
...
Terminology
According to DCAT-AP:
An Application Profile is a specification that reuses terms from one or more base standards, adding more specificity by identifying An Application Profile defines the mandatory, recommended, and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used.A Dataset is a collection of data, published or curated by a single source, and available for access or download in one or more formats. A Data Portal is a Web-based system that contains a data catalogue with descriptions of datasets and provides services enabling discovery and reuse of the datasetscomponents for a specific use case by leveraging terminology from foundational standards. Additionally, it suggests standardized vocabularies to maintain consistency in the use of terms and data.
A Dataset is a self-contained set of data produced by a specific organization, which can be accessed or downloaded for various uses.
A Data Portal is an online platform that offers a catalog of datasets and tools to help users locate and utilize these datasets effectively.
Used Prefixes
Prefix | Namespace IRI | Source |
---|---|---|
|
| |
|
| [DCT] |
|
| [FOAF] |
|
| |
|
| |
|
| |
|
| |
|
| [OWL-TIME] |
|
|
...
Class name | Definition | Usage Note | URI |
---|---|---|---|
Dataset | A resource type. | Used to describe one or more datasets. This describes details about the dataset(s). However, a single dataset can have different ways in which they are made available to potential users. How the data in a dataset can be accessed is defined in the Distribution. |
|
Catalogue Catalog | A catalogue catalog that is listed in the National cataloguecatalog. | Used to describe a bundle of datasets, data services, biobanks, patient registries, or guidelines together under a single title. |
|
Agent | An entity that is associated with catalogue catalog and/or Datasets. | If the Agent is an organisation, the use of the Organization Ontology is recommended. |
|
Resource | Resource published or curated by a single agent. | This is an abstract class, we do not use this class, instead we use specifications of it (e.g. Dataset). This is mainly for a high level grouping and the reuse of properties. |
|
...
Class name | Definition | Usage Note | URI |
---|---|---|---|
Distribution | An available distribution of the dataset. | Used to describe the different ways that a single dataset can be made available in. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage) |
|
Dataset Series | A resource type. Dataset series are defined in [ISO-19115] as a collection of datasets […] sharing common characteristics. However, their use is not limited to geospatial data, although in other domains they can be named differently (e.g., time series, data slices) and defined more or less strictly (see, e.g., the notion of "dataset slice" in [VOCAB-DATA-CUBE]). | With "dataset series" Dataset Series we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset. |
|
Data Service | A Resource type. | The kind of service can be indicated using the DRAFT EXAMPLE: |
|
Project | A project (a collective endeavour of some kind). | Used to describe a project that is connected to one or more datasets. A resource type |
|
Abstract Classes that DO NOT instantiate (do not populate)
...
Class name
...
...
Usage Note
...
URI
Resource
...
The class resource, everything.
Resource is a generic concept from the DCAT vocabulary,
...
that
...
is rarely
...
used directly, but indirectly through its extensions. We recommend
...
avoidingusing dcat:Resource
directly for your document unless the type
...
/class you need is not
...
in this schema.
Class name | Definition | Usage Note | URI |
---|---|---|---|
Resource |
| This class is for grouping and enjoying class hierarchy relation purposes. |
|
Core Metadata Schema Properties per Class
...
Catalog
A curated collection of metadata about resources. A web-based data catalogue catalog is typically represented as a single instance of this class.
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
title | A name given to the resource. |
|
| The name of the cataloguecatalog. This is a required field and needs to be unique. | 1..* |
description | A free-text account of the record. |
|
| A brief description of the cataloguecatalog. It can consist of multiple strings. For example, this catalogue catalog describes breast cancer imaging datasets. | 1..* |
publisher | The entity responsible for making the resource available. |
|
| The organisation or a person that has published the catalogue catalog | 1..* |
dataset | relates every catalogue catalog to its containing datasets/ |
|
| The connection to the one or more datasets that this catalogue catalog describes. | 1..* |
Recommended Properties
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
Relevant contact information for the catalogued catalogd resource. |
|
| Contact information that can be used, for example, for sending requests to further information or access to the Dataset. | 1..* | |
The entity responsible for producing the resource. |
|
| An agent (person or organisation) responsible for producing the dataset. | 1..* | |
A free-text account of the record |
|
| A free-text description of the Dataset. This property can be repeated for parallel language versions of the description. | 1..* | |
Issued | Date of formal issuance (e.g., publication) of the resource. |
| NA | 1..* | |
A unique identifier of the resource being described or cataloguedcatalogd. |
|
| The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the cataloguecatalog. | 1..1 | |
modified | Most recent date on which the catalogue catalog entry was changed, updated or modified. |
|
| The most recent date on which the Dataset was changed or modified. | 1..* |
The entity responsible for making the resource available. |
|
| An agent (organisation or person) responsible for making the Dataset available. | 1..* | |
A main category of the resource. A resource can have multiple themes. |
|
| It consists of 1 or more IRIs (links) separated by commas. When set, it specifies relevant ontology concepts that classify the dataset. Typically, these can be looked up using the Ontology Lookup Service (OLS) or Bioportal. | 1..* | |
A name given to the record. |
|
| A name given to the Dataset. This property can be repeated for parallel language versions of the name. | 1..* | |
The nature or genre of the resource. |
|
| A type of the Dataset. A recommended controlled vocabulary data-type is foreseen. | 1..* | |
License | A legal document under which the resource is made available. |
|
| This should contain a URL that provides details regarding the license that is applicable to this dataset. | 1..* |
relation | defines a relation |
| foaf:Project | 1..* |
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
title | A name given to the distribution. |
|
| the name of the dataset in combination with the format of the distribution can be used | 1..* |
access URL | A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint. |
|
| This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset. | 1..* |
media type | The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES]. |
|
| This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise | 1..* |
description | A unique identifier of the resource being described or cataloguecatalog. |
|
| NA | 1..* |
...
An entity that is associated with Catalogue catalog and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended.
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
name | A name for some thing. |
|
| This property contains a name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages) | 1..* |
identifier | A unique identifier of the resource being described or cataloguecatalog. |
|
| 1..1 |
...
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
Description | description of the project |
|
| NA | 1..* |
identifier | A unique identifier of the resource being described or cataloguecatalog. |
|
| NA | 1.1 |
title | A name given to the resource. |
|
| NA | 1..* |
funded by | An organization funding a project or person. |
|
| NA | 1..* |
relation | link to the project datasets |
|
| NA | 1..* |
...
No recommended properties are identified for this release.
Resource
All things described by RDF are called resources, and they are instances of the class dcat:Resource
. This is the class of everything. All other classes are subclasses of this class. To read more, go to https://www.w3.org/TR/rdf12-schema/#ch_resource.
...