Agent
An entity that is associated with Catalogues and/or Datasets. Agent can be individuals or organisations, If the Agent is an organisation, the use of the Organization Ontology is recommended. [link]
Mandatory Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
name
...
A name for some thing.
...
foaf:name
...
xsd:string
...
This property contains a name of the agent. This property can be repeated for different versions of the name (e.g. the name in different languages)
...
1..*
...
identifier
...
A unique identifier of the resource being described or cataloged.
...
dct:identifier
...
xsd:string
...
1..1
Recommended Properties
No recommended properties are identified for this release.
Project
A project (a collective endeavour of some kind).
Mandatory Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
Description
...
description of the project
...
dct:description
...
xsd:string
...
1..*
...
identifier
...
A unique identifier of the resource being described or cataloged.
...
dct:identifier
...
xsd:string
...
1.1
...
title
...
A name given to the resource.
...
dct:title
...
xsd:string
...
1..*
...
funded by
...
An organization funding a project or person.
...
foaf:fundedBy
...
foaf:Agent
...
1..*
...
relation
...
link to the project datasets
...
dct:relation
...
dct:Dataset
...
1..*
Recommended Properties
No recommended properties are identified for this release.
Resource
All things described by RDF are called resources, and are instances of the class rdfs:Resource
. This is the class of everything. All other classes are subclasses of this class. see more https://www.w3.org/TR/rdf12-schema/#ch_resource
Status | ||
---|---|---|
|
Table of Contents | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Purpose of this document
This document aims to describe the Plateau 1 Core Metadata Schema identified classes and their respective properties as well as provide usage notes to aid implementation. This document does not aim to describe the national catalogue and its onboarding process, nor does it include the future extensions of the metadata schema (leaves/petals), to see other versions and further documentation go to https://github.com/Health-RI/health-ri-metadata/.
Introduction
Context
To find and reuse information scattered across various sources, the research community in the Netherlands has agreed to index their resources in one national catalogue (add link to Portal documentation Marianne Knoop Pathuis - Baarda ). The national catalogue, in its turn, aims to be indexed in international catalogues. One of the necessary requirements to achieve such ambition is to have a common agreement defining the bare minimal items needed to search, find and reuse such resources. Such items and their properties are components of the Core Metadata Schema.
Scope
The Core Metadata Schema is a formal shared conceptualisation of the requirements to find and reuse information across Health-RI nodes via the national catalogue. It depicts a set of minimal elements for describing any resource (including dataset) with a generic metadata. This scope matches the functionality offered in Plateau 1 National Catalogue release package (Marianne Knoop Pathuis - Baarda add link to documentation on plateau 1 and portal development releases).
The core model can further be extended and get specialised to represent domain specific requirements (domains can include Omics, imaging, and etecetra). Therefore, we expect to land with more versions wich will be released in accordance to requirements of Plateaus 2 and further. All versions will be released via https://github.com/Health-RI/health-ri-metadata/.
This version of metadata schema core incorporates DCAT v3 and some selected DCAT-AP mandatory classes and their definitions. The main entities are those that form the core of the DCAT application profile. DCAT-AP is a DCAT application profile for sharing information about Catalogues containing Datasets and Data Services descriptions in Europe. It is meant to be reviewed by the community of nodes that make the Health-RI ecosystem. The final release of this version will be a requirement for resources onboarding the national catalogue (Marianne Knoop Pathuis - Baarda add link) and benefiting of its offered functionality. The how to implement this model and connect to the national catalogue will be described in the onboarding documentation, and the national catalogue offered functionality is defined in the Plateau 1 (Marianne Knoop Pathuis - Baarda add link to Plateau 1 doc).
Notes on Alignment
To define the current core metadata schema, we considered existing metadata from the COVID-19 national portal, provided metadata schema from Health-RI nodes (e.g. ABC metadata) and standards used in portals in Europe and abroad (e.g. W3C, DCAT, DCAT-AP). We then initiated mapping their classes and properties with the help of metadata specialists from the hub and nodes. The mappings are described and open for comments in the mapping table. Finally, after conceptualisation, we decided to re-use of dcat and dcat-ap for its implementation. This is mainly because dcat application profile was covering the collected requirements.
rdfs:Resource and other Resource Type
Within DCAT and DCAT-AP, the term "resource" usually refers to Any Thing that can be described in RDF, but there are specific classes and properties to denote various types of resources:
dcat:Dataset
is a type ofrdfs:Resource
representing a collection of datadcat:Distribution
is a type ofrdfs:Resource
representing an available form or representation of a dataset.dcat:Catalog
is a type ofrdfs:Resource
representing a collection of datasets.dcat:DataService
, (introduced in DCAT version 2) is a type of Resource representing a service through which data can be accessed.foaf:Project
is a type ofrdfs:Resource
representing project level information
In dcat and dcat-ap Resource is The DCAT vocabulary is centered around datasets. However, there might be a need to represent a more diverse range of resources specific to certain domains, such as biobanks or patient registries. Here, we suggest some potential scenarios for adapting or expanding DCAT to better represent your resource type.
Utilize dcat:Resource Directly: If the resource you're dealing with doesn't align perfectly with the definition of a dcat:Dataset, you can opt for the more general term dcat:Resource. This allows you to represent virtually any asset. While this approach provides greater flexibility, it may not offer precise clarity for users trying to grasp the resource's essence. Over time, we can further define the resource type with specific vocabularies.
Expand with Personalized Classes: Should there be a requirement to depict particular resources, such as biobanks or patient registries, consider enhancing the foundational DCAT vocabulary with custom classes. For instance:
For instance:
:Collection a rdfs:Class ;
rdfs:subClassOf dcat:Resource .
or
:PatientRegistry a rdfs:Class ;
rdfs:subClassOf dcat:Dataset .
With such custom classes, you must provide more specific metadata about each type of resource and make it clear for other users or systems to differentiate and understand the nuances between them (ask yourself how collection is different than dataset?).
Feedback via Git Issues
Should you want to extend the model e.g, Resource and define a new concept register an issue in Health-RI github https://github.com/Health-RI/health-ri-metadata/tree/master and explain the extension assign
it ‘denatahvildari’ Or ‘brunasv’. We together with you implement the new addition for the next release.
Overview
An overview of the Metadata schema core is shown by the UML diagram below (Fig 1). The UML illustrates the main classes (entities) and does not show the detailed definitions such as rdfs:label rdfs:comment. Each block represents a class, and it contains a list of attributes of such class (properties). If a class is connected to another class by a closed arrow, it means that it inherits all properties from the other class (e.g. dcat:DatasetSeries inherits from dcat:Dataset which inherits from dcat:Resource). The other arrows, represent relations and contain the type of relation (e.g. dcat:Dataset is connect to a dcat:DatasetSeries via the predicate dcat:inSeries) and the cardinality (e.g. dcat:Dataset can be connected via dcat:inSeries to zero or more dcat:DatasetSeries).
Recommended Versus Mandatory
In line with dcat-ap specification, we make distinction between recommended and mandatory elements (class, and properties). In the following sections, classes and properties are grouped under headings ‘mandatory’, ‘recommended’. In future we might have a third category 'Optional'.
In the data exchange scenario, these terms have the following meaning:
Mandatory class: a receiver of data MUST be able to process information about instances of the class; a sender of data MUST provide information about instances of the class
Recommended class: a sender of data SHOULD provide information about instances of the class; a sender of data MUST provide information about instances of the class, if such information is available; a receiver of data MUST be able to process information about instances of the class.
Optional class: a receiver MUST be able to process information about instances of the class; a sender MAY provide the information but is not obliged to do so.
Mandatory property: a receiver MUST be able to process the information for that property; a sender MUST provide the information for that property.
Recommended property: a receiver MUST be able to process the information for that property; a sender SHOULD provide the information for that property if it is available.
Optional property: a receiver MUST be able to process the information for that property; a sender MAY provide the information for that property but is not obliged to do so.
Diagram
...
Terminology
According to DCAT-AP:
An Application Profile is a specification that reuses terms from one or more base standards, adding more specificity by identifying mandatory, recommended and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used.
A Dataset is a collection of data, published or curated by a single source, and available for access or download in one or more formats. A Data Portal is a Web-based system that contains a data catalogue with descriptions of datasets and provides services enabling discovery and reuse of the datasets.
Used Prefixes
Prefix | Namespace IRI | Source |
---|---|---|
|
| |
|
| [DCT] |
|
| [FOAF] |
|
| |
|
| |
|
| |
|
| |
|
| [OWL-TIME] |
|
|
Core Metadata Schema Classes
Mandatory Classes
Class name | Definition | Usage Note | URI |
---|---|---|---|
Dataset | A resource type. | Used to describe one or more datasets. This describes details about the dataset(s). However, a single dataset can have different ways in which they are made available to potential users. How the data in a dataset can be accessed is defined in the Distribution. |
|
Catalogue | A catalog that is listed in the National Catalog. | Used to describe a bundle of datasets, data services, biobanks, patient registries, or guidelines together under a single title. |
|
Agent | An entity that is associated with Catalogues and/or Datasets. If the Agent is an organisation, the use of the Organization Ontology is recommended. | Used to describe a person or organisation who is related to or describing their resources. |
|
Resource | Anything described by RDF. | This is an abstract class, we do not use this class, instead we use specifications of it (e.g. Dataset). This is mainly for a high level grouping and the reuse of properties. |
|
Recommended Classes
Class name | Definition | Usage Note | URI |
---|---|---|---|
Distribution | A physical embodiment of the Dataset in a particular media format. | Used to describe the different ways that a single dataset can be made available in. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage) |
|
Dataset Series | A resource type. Dataset series are defined in [ISO-19115] as a collection of datasets […] sharing common characteristics. However, their use is not limited to geospatial data, although in other domains they can be named differently (e.g., time series, data slices) and defined more or less strictly (see, e.g., the notion of "dataset slice" in [VOCAB-DATA-CUBE]). | With "dataset series" we refer to data, somehow interrelated, that are published separately. An example is budget data split by year and/or country, instead of being made available in a single dataset. |
|
Data Service | A Resource type. | The kind of service can be indicated using the DRAFT EXAMPLE: Health-ri offers ??distinct service categories: one that allows querying of secure data/metadata through an interface, and another that offers analytical or statistical insights without directly serving the data, especially if it's sensitive. The first type is linked to the Dataset record it derives from, while the second is connected to a Catalog. |
|
Project | A project (a collective endeavour of some kind). | Used to describe a project that is connected to one or more datasets. A resource type |
|
Abstract Classes that DO NOT instantiate (do not populate)
Class name | Definition | Usage Note | URI |
---|---|---|---|
Resource |
| Resource is a generic concept from the DCAT vocabulary, which means that you rarely use this class directly, but indirectly through its extensions. We recommend that you avoid using dcat:Resource directly for your document unless the type that you are looking for is not available in this schema. |
|
Core Metadata Schema Properties per Class
Catalog
A curated collection of metadata about resources. A web-based data catalog is typically represented as a single instance of this class.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
dataset | relates every catalog to its containing datasets/ |
|
| The connection to the one or more datasets that this catalog describes. | 1..* |
description | A free-text account of the record. |
|
| A brief description of the catalogue. It can consist of multiple strings. For example, this catalogue describes breast cancer imaging datasets. | 1..* |
publisher | The entity responsible for making the resource available. |
|
| The organisation or a person that has published the catalog | 1..* |
title | A name given to the resource. |
|
| The name of the catalog. This is a required field and needs to be unique. | 1..* |
Recommended Properties
No recommended properties are identified for this release.
Dataset
A collection of data, published or curated by a single agent, and available for access or download in one or more representations.
Mandatory Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
...
Relevant contact information for the cataloged resource.
...
dcat:contactPoint
...
foaf:Agent
...
Contact information that can be used, for example, for sending requests to further information or access to the Dataset.
...
1..*
...
...
The entity responsible for producing the resource.
...
dct:creator
...
foaf:Agent
...
An agent (person or organisation) responsible for producing the dataset.
...
1..*
...
...
A free-text account of the record
...
dct:description
...
rdfs:Literal
xsd:string
...
A free-text description of the Dataset. This property can be repeated for parallel language versions of the description.
...
1..*
...
release date
...
Date of formal issuance (e.g., publication) of the resource.
...
dct:issued
...
rdfs:Literal typed as xsd:date or xsd:dateTime
...
1..*
...
...
A unique identifier of the resource being described or cataloged.
...
dct:identifier
...
xsd:string
...
The main identifier for the Dataset, e.g. the URI or other unique identifier in the context of the Catalogue.
...
1..1
...
modified
...
Most recent date on which the catalog entry was changed, updated or modified.
...
dct:modified
...
xsd:datetime
...
The most recent date on which the Dataset was changed or modified.
...
1..*
...
...
The entity responsible for making the resource available.
...
dct:publisher
...
foaf:Agent
...
An agent (organisation or person) responsible for making the Dataset available.
...
1..*
...
...
A main category of the resource. A resource can have multiple themes.
...
dcat:themeTaxonomy
...
skos:Concept
...
It consists of 1 or more IRIs (links) separated by commas. When set, it specifies relevant ontology concepts that classify the dataset. Typically, these can be looked up using the Ontology Lookup Service (OLS) or Bioportal.
...
1..*
...
...
A name given to the record.
dct:title
...
xsd:string
rdfs:Literal
...
A name given to the Dataset. This property can be repeated for parallel language versions of the name.
...
1..*
...
...
The nature or genre of the resource.
...
dct:type
...
skos:Concept
...
A type of the Dataset. A recommended controlled vocabulary data-type is foreseen.
...
1..*
...
License
...
A legal document under which the resource is made available.
...
dct:license
dcterms:LicenseDocument
...
This should contain a URL that provides details regarding the license that is applicable to this dataset.
...
1..*
...
relation
...
dct:relation
...
foaf:Project
...
1..*
Recommended Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
dataset distribution
...
An available distribution of the dataset.
...
dcat:distribution
...
dcat:Distribution
...
Use this property to point to the distribution of this dataset when a distribution is available.
...
0..*
...
project
...
connect dataset to the corresponding projects
...
foaf:Project
...
xsd:string
...
Use this property to point to the related project of this dataset when a project is available.
...
0..*
...
has version
...
This resource has a more specific, versioned resource [PAV].
...
dct:hasVersion
...
xsd:string
...
This property refers to a related Dataset that is a version, edition, or adaptation of the described Dataset.
...
0..*
...
in series
...
A dataset series of which the dataset is part.
...
dcat:inSeries
...
dcat:DatasetSeries
...
0..*
Data Service
A collection of operations that provides access to one or more datasets or data processing functions.
Mandatory Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
end point URL | The root location or primary endpoint of the service (a Web-resolvable IRI). |
|
Or
| NA | 1..* |
title | A name given to the distribution. |
|
| NA | 1..* |
serves dataset | A collection of data that this data service can distribute. |
|
| NA | 1..* |
Recommended Properties
Property name | Definition | URI | rdfs:Range | Usage Note | Cardinality |
---|---|---|---|---|---|
end point description | A description of the services available via the end-points, including their operations, parameters etc. |
|
| An endpoint description may be expressed in a machine-readable form, such as an OpenAPI (Swagger) description [OpenAPI], an OGC | 0..* |
Distribution
A physical embodiment of the Dataset in a particular format.
Mandatory Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
access URL
...
A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint.
...
dcat:accessURL
...
xsd:string
...
This property contains a URL that gives access to a Distribution of the Dataset. The resource at the access URL may contain information about how to get the Dataset.
...
1..*
...
media type
...
The media type of the distribution as defined by IANA [IANA-MEDIA-TYPES].
...
dcat:mediaType
...
xsd:string
...
This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise dcterms:format
MAY be used with different values.
...
1..*
...
title
...
A name given to the resource.
...
dct:title
...
xsd:string
...
1..*
...
description
...
A unique identifier of the resource being described or cataloged.
...
dct:description
...
xsd:string
...
1..*
Recommended Properties
...
Property name
...
Definition
...
URI
...
rdfs:Range
...
Usage Note
...
Cardinality
...
access service
...
A data service that gives access to the distribution of the dataset
...
dcat:accessService
...
dcat:DataService
...
dcat:accessService
SHOULD be used to link to a description of a dcat:DataService
that can provide access to this distribution.
...
0..*
...
download URL
...
The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution's dcterms:format
and/or dcat:mediaType
...
dcat:downloadURL
...
rdfs:Resource
xsd:string
...
0..*
|
This page has been moved together with technical specifications into the Metadata Github. You can find the contents here.
If there are any issues with continuity, please contact HRI service desk: 📧 servicedesk@health-ri.nl