Img2Catalog: onboarding image collections

DATE: 23-08-2024 STATUS: ADOPTED

Onboarding datasets (manually) for a catalog is a time-consuming process. To simplify the registration of image datasets, Health-RI has developed the registration tool img2catalog. This tool automatically forwards metadata data periodically and therefore prevents manual entry. Manual input is more error-prone, more labor-intensive, and less future-proof.

Context

The Health-RI ecosystem uses FAIR Data Points (FDP) to get the metadata of datasets into the catalog. The national health data catalog is linked to FAIR data points and makes the descriptive data of data collections findable.

Each node can decide whether to use their own FDP(s) and/or catalog, or whether to use the national (Health-RI managed) FDP directly.
In the case of your own FDP, it can be registered with the National Health Data Catalog as well as with other (e.g. regional or European) catalogues. More configurations are conceivable (see for example Figure 1).

image-20240429-145806.png
Figure 1: Examples of how the process of registering image collections can be set up.

The registration tool “img2catalog” can register image collections directly at a Health-RI FAIR Data Point (FDP) (Node A), as well as at a local FDP that can then be linked to the national health data catalog (Node B).

Features

Img2Catalog is a command-line tool and supports the following features:

  • generating metadata from an image dataset in XNAT or Grandchallenge

  • adding or updating metadata on a FAIR Data Point

Mapping

The current version of img2catalog maps the DCAT-AP fields required in version <TBD>. This mapping is created for XNAT as an image storage platform as follows:

dcat:Catalog:

  • title: from configuration file

  • description: from configuration file

  • Dataset: IRIs of Datasets from the XNAT

dcat:Dataset:

  • title: Title of XNAT project

  • description: Description of XNAT project

  • identifier: XNAT project id

  • keyword: XNAT keywords, where each keyword is separated by a space

  • contactPoint: from the configuration file, for now one fixed contact point for an entire XNAT

  • creator: Principal Investigator data from XNAT

Img2Catalog also works on a Grandchallenge as an image storage platform and will be made suitable for connection to a DWH in a next version.

If the set of mandatory DCAT-AP fields is adjusted in the future, they will also be included in the next version of Img2Catalog.

For details regarding installation and use, see GitHub - Health-RI/img2catalog: Repository for a tool to help make XNAT into a FAIR Data Point