Storyline: Making data available

DATE: 23-08-2024 STATUS: ADOPTED

This article describes the storyline making data available for multiple use. This concerns both research data and healthcare data that have been transformed into research data.

A data provider makes a dataset available within the Health-RI ecosystem for multiple use. After review by the Local review committee data providing and the Technology Transfer Office (TTO) whether and under what user conditions the relevant dataset may be made available, the data provider draws up the delivery. The data provider sets up the delivery process and publishes the metadata of the dataset on the data guide.

See Storyline: Making imaging data available for a more detailed version of this storyline for data category imaging.

See https://health-ri.atlassian.net/wiki/spaces/HA/pages/164497847 for the collaboration diagram of this storyline

Comments

This storyline only shows the initial availability of a dataset that is/will be made suitable for multiple use, for example research, such as the data from, for example, an EDC or EPD. Because datasets can grow over time, a synchronization mechanism must be placed to keep updating datasets

Most systems are expanded daily with new data.

The care dataset is dynamic; it's more of a data definition than a dataset. Over time, patients who meet the data definition may be added.
For existing patients, new data points emerge over time. Its metadata will be dynamically available in the portal.

Data from the healthcare systems will often not meet the founder unity of language directly from the production environments, but will require a persistent data platform (Data Repository / Middleware), where the (meta)data is prepared for publication in (national) metadata catalogs according to the founder unity of language.

Note: this article talks about 'FAIR made data'. This means that the data is provided with:

Context information
Coding and modelling according to the founder unity of language, preferably according to guidelines of the data governance committee
Terms of Use established by the Local review committee data providing
Assessment of user consent
Characteristics for quality or usability
Version control

Precondition

There is a dataset that a dataholder wants to offer for multiple use. It can be
- research data (e.g. generated by means of an EDC tool) or a result dataset of a central or federated analysis
  - The data is generally already made FAIR in Storyline: Generating research data
  - The terms of use have already been agreed in Storyline: Generating research data
- Healthcare data transformed into available research data

Process model

The dataproducer asks the Local review committee data providing whether and under which conditions the data may be offered for multiple use.
The local review committee data providing determines, based on the pre defined data policy, whether and under which conditions the data may be offered.
The data governance committee
- Indicates for previously defined data how to make it FAIR, which coding and modeling to apply if the data is not or not sufficient FAIR
The data producer produces, creates and stores the original health data on the basis of agreements made and guidelines for, among other things, storage of data, ownership, intellectual property, terms of use, selection and retention. The data producer preferably produces the data FAIR (see note above)
The Data Preparator and Data Producer use an existing (reusable) interface or create new reusable interface (preferably in collaboration with the software supplier data producer), between the data producer and the data preparator.
Based on the data-centric principle, the Data Preparator separates the original health data from the original application and prepares the original health data, where necessary, for multiple use (including for use for research and innovation) in a persistent data platform.
- If the data producer has not made the data sufficiently FAIR, the Data Preparator will, where possible, further make it FAIR (see note above)
- If the data producer has made the data sufficiently FAIR, the Data Preparator will implement the data-centric principle from a persistent data platform, which acts as a transport layer towards the data provider.
- The data preparator ensures that the metadata of the dataset is up-to-date
The data provider sets up the dataset delivery process in accordance with the authorization metadata of the dataset recorded by the data preparer and, if necessary, sets up a target transformation (e.g. if a pseudonymization still needs to be performed).
The data provider registers and publishes the metadata at a FAIR data point of their choice, either as a separate application or integrated into an application with data repository or metadata catalog functionality.
The data provider registers the FAIR data point used with a FAIR data point register (see e.g. FAIR Data Point) if this has not already been done. It is the responsibility of the data guide to regularly and/or triggered by a search query to retrieve the latest state of affairs of the metadata from the FAIR data points. (see Storyline: Search data in metadata
The data provider makes the dataset immediately available if the dataset is public.

Postcondition

The data is available on a persistent data platform
The data is made FAIR
The data is
- readily available if it's public
- available after an approved request and additional processing (e.g. data minimisation and pseudonymisation) by the data provider
The metadata of the data is published and can be found

Process diagram''Making data available”