FAQ Information and FAIR Data

date: 23-08-2024 Status: ADOPTED

Information and FAIR Data

Nr

CR

Question

Answer

1

V1

What types of data distinguishes Health-RI.

In the Health-RI ecosystem, the following types of data are distinguished:

  1. Clinical data

  2. Imaging data

  3. Omics data

This data appears in healthcare data, as well as research data, as well as all other data applicable to health (e.g. IoT data and citizen-generated data). Everything falls under the heading of health data.

2

V1

How does Health-RI facilitate that a request for which 10 hospitals will provide data does not have to be carried out 10 times in every hospital?

There are currently several projects in which HRI is involved (e.g. nWMO assessment framework / mutual recognition)). The 'stacked assessment' is also included as an obstacle in the Obstacle Remove Trajectory.

3

V1

The meta-data must now be registered with either one or the other. Why were two options chosen, instead of one? My concern is that this could lead to confusion.

The focus is on a federated system in which the metadata is recorded in a standardized way at the source (e.g. in a FAIR data point)

4

V1

If I think from the perspective of a researcher who has collected a nice clinical set after 4 years of PhD and has to register his set with the Health-RI landscape for findability, among other things, two months before obtaining his PhD. Then I foresee a lot of resistance to the amount of steps, correspondence and manpower required for this. How will you anticipate this?

This is a known obstacle.

That is why we try to create as many tools and tools as possible in the ecosystem for a researcher who wants to share data. For example, by helping researchers with training and tools to create a reusable dataset from the earliest possible stage in the research. In general, researchers benefit from reusable data that they can use as input for their own research.

5

V1

We ask a lot of the researcher. Researchers from cohorts and registries still see the benefits of reuse, but how do you think you can get the masses who do smaller-scale and more self-contained research on board? And this despite pressure from, for example, ZonMW?

Reuse of data is also in the interest of researchers. If it is properly arranged, small-scale studies can also benefit from it. This also makes it easier to contribute to it. Health-RI and ZonMw are also exploring how researchers can be encouraged to use the services of HRI (e.g. in the field of data sharing).

Scientists should also take into account European legislation on the mandatory availability of data financed with public money; scientists are also often subject to the Data Guidelines, the Reuse of Government Information Act, the Data Governance Act. The problem is that too little thought has been given in the Netherlands to the fact that making data available costs time and therefore money. This should be financed from the central pot, just as a cycle path does not have to be financed by the user or by the person who builds the cycle path. There should be money available from public funds for the construction of a data infrastructure, and a focus on individual researchers (or pieces of cycle path), seems to us to be an approach from the wrong side. We need to think from system to detail, not the other way around.

6

V1

Is it possible to perform queries?

Yes, provided you are authenticated for this. There are also several layers in the metadata, so that information can already be found publicly and more specific metadata can be found after logging in, feasibility studies can be run or synthetic data can be used.

7

V1

What minimum level of FAIRification is needed for this to work?

Which levels of FAIRification are needed must be agreed in the joint Health-RI work process for FAIR. In order to clarify what the joint Health-RI approach in FAIR data is, we have inventoried various FAIR models, with the aim of agreeing on a concrete and practical work process around FAIR. This work process will provide more clarity. This is a work in progress via this page

In the short term, a limited basic set of DCAT is required for the metadata  in order to connect as many sources as possible. In the longer term, we want to raise this level and also give the catalog the possibility to filter on this. 

We expect HealthDCAT-AP to be published in early July 2024.

8

V1

Does this process also apply to the development of AI models, or does it require a specific process?

Here are possibly. additional steps needed. At the start, the process must be approved, additional steps may be required to cover the learning curve.

9

V1

In order to train and market AI models (to bring them to the patient), start-ups, companies, need the data to provide in their certification file for CE/FDA approval. UMCs are probably not going to act as manufacturers themselves, how is this taken into account?

The innovator is also a user and can use the infrastructure to train AI algorithms. There is a condition: every data request must be traceable to a natural person (who gave the order?)

10

V1

If training AI models has to run on datasets that remain with the data holder, how does Health-RI see this, especially financially (storage & compute)?

Clear agreements will have to be made about who pays for storage & compute when federated training AI models or other forms of federated analysis. At the moment, these agreements are not yet at national level and it is determined for each consortium how these costs are passed on.

11

V1

How many databases will be redundant by FAIRification? Copies become redundant, reducing storage and increasing compute. This could lead to resistance. What are we going to do about this?

There are discussions about this with, for example, quality registrations. Usually it is duplicated because there is no other solution. There could be resistance in the first instance, communication from OVT cluster 1 can help with this. Parties also have to do less to make data suitable for multiple use.

12

V1

How do we prevent low-quality data from burdening and polluting the system?

 

EHDS talks about quality/usability indicators that we want to dive deeper into in subsequent versions

Quality is also subjective; what is useful for the 1 may not be useful for the other.

By mentioning these characteristics in the catalogue, a researcher can choose whether or not to include the data in question

13

V1

Are researchers also trained and taken by the hand, as happens, for example, with google cloud services/

Attention must indeed be paid to this, for example by means of e-learning. This involves reusing existing e-learning components.

The FAIR service desk will house a portfolio of existing training courses and materials. But if necessary (and possible) we will also organize workshops and trainings on making data FAIR for specific data types or communities.

14

V1

Is (meta) data stored in the cloud? What about security?

(Meta) data can be stored in the cloud. Everything must be secured and comply with laws and regulations.

15

V1

I am not a researcher, but I can imagine that a researcher also gives provenance / explanation about the data used (selection) and especially cleaning and the use of data. How does this work when part of the cleaning is no longer in his hand?

Under the GDPR, a distinction is made between data holders (who make independent decisions about data) and data processors, who work with data from others, but do not decide on it. A data generator is not a separate group. Ergo, if a researcher has generated data, he is data holder and responsible. However, if he/she has transferred the data correctly, then that next party becomes an independent data holder and therefore has his/her own obligations under the law: the same obligations as the original data holder. This is only different if data is not transferred, but made available; For example, companies make data available to their accountant, but they are not allowed to decide for themselves. So when making data available by a scientist, it is important to assess whether the recipient becomes an independent data holder (who is then independently responsible for complying with the law), or whether he becomes a data processor (in which case it must be agreed what exactly the processor may and may not do).

16

V1

Am I hearing correctly that realistic synthetic data is also part of Health-RI? Realistic as in statistically correct.

Health-RI includes all forms of data in the infrastructure such as pseudominized, anonymized and synthesized data. But that doesn't mean that Health-RI generates it itself. However, Health-RI stimulates the necessary developments.

17

V1

What are the timelines that this is operational. Because until then, databases will continue to be needed

We are at the beginning. Unfortunately, it is impossible to predict when we will no longer need databases.

18

V1

Is 'differential privacy' being looked at? This allows you to make data 'irreducible' and therefore publicly available without approval, while maintaining reliability.

These techniques can be used. However, we also need a trust model using these techniques.

19

V1

In practice, we notice that hospitals are reluctant to request data because they are "waiting for initiatives such as Health-RI for the provision of data". Now it takes us a lot of effort to approach all Dutch hospitals and ambulance services to manually obtain their data. Is there a way for us to anticipate the developments and obtain this data "under the banner of Health-RI"? So according to the "Health-RI" standard. Is there any discussion with Health-RI about this?

Yes, this can be talked about. We have to look at how this data can be made FAIR and put through the car wash.

Currently, connection process is at the top of the agenda. Health-RI wants to define a connection process in which we can help / steer as many parties as possible at the same time in making data suitable for multiple use.

 

20

V1

Does the entire communication, training and campaign plan to get researchers on board with this also fall under the umbrella of Health-RI? or is it the expectation that eg. the academic institutions are going to do that themselves?

ZonMw is now exploring how researchers can be stimulated, the idea is to expand this further to other research funders. Together with ZonMW, Health-RI is working on an overview of services in the FAIR service desk, for example. Furthermore, Health-RI contributes to the training of data stewards at the regional nodes, in order to increase capacity and expertise in the field of data reuse. Data stewards play a role in raising awareness, supporting and training researchers in their research institutions.

21

V1

If I take edited data from a data holder and I do not have access to the raw data, how can I be sure that no errors have occurred when editing the data?

If the editing process and other metadata are open and transparent, errors can be detected, otherwise not.

Furthermore, if the errors cannot be corrected by the data holder because they arise, for example, due to errors in the source data, then the errors remain and must be corrected at another level. If that is done, the errors can still be corrected for the benefit of the data user.

22

V1

Is there an alternative storyline if copying data is not ideal (e.g. due to the size of the dataset or if it is streaming data). Can you use the API to process the data in a different way.

Not via the API (at the moment). However, one can do the research in a different way via a federated processing or a data-visiting solution (e.g. FAIR data trains).

23

V2

FAIR Data Point (FDPs): there are several implementations. It's about the specs that can be implemented in multiple ways. FDP: the machine knows what I mean. These are resolvable identifiers (and can therefore allow the use of multiple ontologies: kind of like google translate of data)

 I agree

24

V2

The DCAT schedule, where can I find it?

This can be  found on the wiki and on DCAT-AP-NL via Geonovum.

25

V2

Who is involved in the development of national data models? For example, from/within CumuluZ

Consultations are now taking place with the nodes and with system parties via OVT.

For example, we look in the sunflower at how EU Patient Summary (EPS) can be used in the minimal dataset, ditto for metadata schema.

We currently see that various parties are active in this area. Health-RI is actively working towards a unity of language that is supported by all users of health data. As soon as more complete results can be reported, they will be published in subsequent versions

26

V2

Not everything will be readily available. It will be a step-by-step process. Where necessary, we should be able to go back to a data holder/source to further enrich existing data. Is there a process for this?

True. In the approach, we will make a distinction between existing data and new data to be generated.

These include:

  • In the case of new generation to be generated, we want to apply the principle of mopping with the tap closed; i.e. taking care of new data from a mutually agreed moment meets the requirements of unity of language and technology

  • Old data will initially be made findable, as soon as it is reused, more energy will be put into making it FAIR.

These are ideas that are currently being worked out. As soon as these lead to supported solutions, we will publish them in future versions of the Wiki.

27

V2

The DCAT-AP version that has now been implemented by Health-RI for the catalog has not yet been very extensive. Is there anything to say about the expansions that Health-RI will be working on?

Health DCAT AP extension could be available for the next round of consultations. More information will follow soon at the following link Extension of DCAT-AP: HealthDCAT-AP - EHDS2 Pilot - Official website 

28

V2

I know that a lot of work is also being done on the metadata of image data and omics data. Why aren't these communities using the github used for the core metadata yet? GitHub - Health-RI/health-ri-metadata: health ri metadata schemas In that case, other developers can monitor what's happening in real-time.

We encourage the use of the github.

29

V2

The presentation included the mandatory metadata fields for datasets. What about mandatory metadata fields for distribution and catalogs?

There is still some discrepancy at the moment. We are working on this and need the help of the community.

30

V2

There is a request for comment on the model, what should we comment on? What is specific to Health-RI? We do not review DCAT.

We keep an eye on DCAT-AP NL and Health, but we would like feedback on specific models that we need to describe specific data

We also need feedback on the subset of DCAT we are using, and whether it meets the requirements of the node. For example, do you have things other than datasets to expose? Do you have any modification dates? Also: we can add other models later to represent our core! DCAT is not complete, but we need DCAT for addressability.

We are also working on formal models, for example in SHACL rules, we would like to hear feedback on that too: https://github.com/Health-RI/health-ri-metadata/blob/master/Metadata%20Schemas%20(Formal%20models)/Core%20Metadata%20Model/core.shapes.ttl

31

V2

The data provider is responsible for the data. Agreements are needed for this. There is a role for ontology and knowledge model (unity of meaning).

Modelling the cancer process, for example, is not reflected in SNOMED. Can one make up for some fog with other terminologies / reference models ?

Striving for unity of technology and unity of language. Use of terminologies: where necessary/useful, several terminologies will or can complement each other, or be used.

32

V2

Is the data that comes from operational care and administrative processes (care and business management processes) also in the scope of the Health-RI ecosystem?

This is related to the definition of Health Data. We still have to decide whether administrative data falls within the scope of Health-RI.

33

V3

I'm missing a link between the overview of omics data standards and the FAIR data section of the wiki. I would prefer to have a single point of access to Health-RI data and metadata standards. Is it intended to be combined?

In the long run, yes: we are now opting for separate pages because the Health-RI wiki is updated twice a year and the FAIR data Wiki more often. The FAIR data wiki is more up-to-date.

34

V3

When preparing data, modelling is done and how? Semantic or other? And which semantic model? OMOP?

We would like to connect with healthcare, make data definitions together. This consists of a codebook, but also a metadata schema for the dataset and the data points.  We want to set up good system management for this.

 

35

V3

In terms of modelling information, EJP-RD also does a lot.

We are also working on the EJP-RD platform (including semantic modelling). We would like to know how to connect and make the right choices.

We try to use FDP and the semantic descriptions not only in Health-RI, but we also coordinate with projects such as EUCAIM and GDI. We look for generic solutions.

36

V3

Are we perhaps waiting for EHDS to make decisions about the modelling of data?

EHDS is not going to make any decisions about the data description. This must be coordinated in the specific fields.

37

V3

I assume that you are also in discussion with the EPD suppliers about modeling data? It would make a huge difference in particular at the 'general' hospitals. Assuming that more than 50% of the care is provided by the 26 STZ hospitals alone.

Still insufficient. We are in the process of arriving at data definitions, or semantic unification.

38

V3

Nictiz is about the standards and ZIBs, isn't it? How does that relate to your approach to this metadata?

We have an orchestration role and are in consultation with parties such as Nictiz to empower everyone.

39

V3

Is Dama's DM-BOK model used for data governance?

That has yet to be determined by the appropriate parties.

40

V3

About identifiers: not what you're asking, but if there are different systems for creating identifiers, don't you still have a chance of duplicates? OR does everyone really have a different system?

That concerns us, whether we invest it centrally or federated. We want to get in touch with parties that have experience in this so that datasets can be uniquely identified. Think with us !

 

41

V3

Is there a risk that different parties will create the same DOI?

There are good and bad systems for identifiers. Classically, "numbers" are used, which have a different meaning in each system. We want to work towards a system that is truly persistent in a PID as such.

 

42

V3

In our hospital, they find it very important to check with EVERY data issuance whether the patient's consent is still up to date and to check this even more frequently (in order to be able to withdraw the data from those patients during studies).

In principle, before each issue for a specific research project, it is checked whether this is in line with the patient's say. When it concerns the reuse of healthcare data, the national say register contains the current say statement. In the case of reuse of a historical collection, it will be assessed whether the new research question and methodology fit within the informed consent that participants previously gave and the participants have not withdrawn their consent. Under the EHDS, this last route in particular may change. This will be mapped out in the coming period on the basis of further concretization from EHDS and HDAB-NL.

 

43

V3

If a dataset with a DOI is going to be issued, such a consent check would also be done, or is the intention in the context of the EHDS: if patient data was in that dataset for which there was consent THEN, then the dataset can continue to be issued 'as is'? Or will such a check still be done for up-to-date consent status?

At this point, it depends on the specific context whether a new consent check needs to be done. When it comes to reproducing an analysis, this is usually not the case, when it comes to a new research question, that check is usually performed again. For the future, we still have to map out how this will work once the EHDS has entered into force.

 

44

V3

What about the right to erasure? Or is that something else?

If data subjects want to delete data that concerns them, then in principle this must be honoured. Exceptions can sometimes be made, for example because the data has been issued for a specific analysis and it must remain reproducible. If data is completely anonymous, this right does not apply.

45

V3

The core metadata schema now adheres to DCAT-AP. I think compatibility with the HEALTHDCAT_AP extension will be important later on. This extension will likely be more restricted on some features of DCAT-AP (e.g., the use of controlled vocabularies). How will this be handled?

That's indeed one of the things we're going to do in the current plateau of Health-RI.

So until the end of the year, we will include HealthDCAT-AP, but also the Dutch application profile DCAT-AP NL. Neither of them has been officially finalized yet, but we want to implement the 'pre-release' versions or the draft versions as far as possible in the next phase of our core and core metadata schema and health extension.

46

V3

Isn't data minimization also limiting the variables that you send?

Indeed, we distinguish between horizontal and vertical data minimization.

47

V3

Who is involved in thinking about national data models?

We currently have the Ministry of Health, Welfare and Sport as the system holder and Nictiz as the temporary system manager.

 

Â