Omics working group
This article contains the profile of the Omics working group. The Omics profile contains the specific agreements that apply to the Omics data category.
Title
Title | Omics data |
General
Profile metadata | 2023-09-26 Versie 0.0.1 |
Release-information | 2023-09-26 Auteur: R.W.W. Hooft |
Law and regulations
Legal basis | Human heredity data (which in any case includes both genomics and genetics data: the study of all genes or specific genes) (with specific exceptions) belongs to the special categories of personal data and often cannot be made anonymous. This not only requires a basis for processing under Art 6 GDPR, but also a lifting of the prohibition on processing under Art 9 GDPR. In the Netherlands, the basic principle is that genetic personal data may only be processed if the data subject has given explicit consent, i.e. via an opt-in variant of consent. In the event of a compelling medical interest or for scientific research, genetic data can sometimes be used without permission. This is only permitted if it is impossible to obtain consent, or if requesting consent requires a disproportionate effort. For other forms of omics, such as proteomics and metabolomics, although the data comes from a person, with the current state of technology it cannot be traced back to a person without other data. This would have the consequence that only combination with other traceable data poses potential risks and, moreover, combination with this type of omics data does not increase the risk. There are people who expect that conversion will be possible in some cases in the future and who are even more cautious. |
Organizational policy
Roles and actors | The article roles describes the generic roles within the Health-RI ecosystem. Within Omics there are specific specifications for the following roles. Dataproducer Omics data is collected by three groups:
In addition, it happens that individuals have their own omics data collected, for example by commercial providers of those types of services; At this time, these data are not yet in scope. Data Governance Committee There is currently not much coordination about data governance for omics data. As part of the European 1+MG initiative and the GDI project, an infrastructure for sharing high-quality human genome data is being developed. This is still in an early phase of realization. It is already clear that this infrastructure will set up a central European Data Access Commission that will judge the issuance of data. The European Health Data Space Regulation (EHDS) will also regulate the exchange of genomics and proteomics data, including for secondary use: countries will be able to impose their own national conditions for the use of this data in addition to the conditions set by the EHDS regulation. |
Inclusion and exclusion criteria for participants | There are different types of omics, each with specific properties of the data. The first focus is on “genomics”. For Genomics data, the consensus is that anonymization is not possible; re-identification is relatively simple. That is why the GDPR always applies to genomics data, and on top of that the Art 9 GDPR prohibition (because it concerns special personal data) and also the clearly stated principles that DNA data may only be processed on the basis of consent (opt-in). worked. Genomics data, especially in raw form, is very large volume data (hundreds of gigabytes for a whole-genome sequencing dataset for a single person). Genomics data is grouped into “cohorts” based on the use of the measurement technique and especially on the basis of the associated phenotypic data: the genome data itself has the same form largely regardless of the purpose of the determination, so it is the other data of the person that are available that determine the grouping (e.g. it concerns all patients from the cardiology department at the UMCU, for which very comparable other data is also available). |
Information
Metadata | The article minimal (meta)dataset describes the generic (meta)dataset within the Health-RI ecosystem. The following addition exists within Omics. The minimum is currently DCAT version 2.0. All domain-specific metadata fields described below will only become part of the metadata model in later plateaus. For all omics data, the following common metadata fields apply on top of the health-ri metadata:
The genomics sunflower leaf contains the following additional metadata fields:
|
Information standards | File formats for genomics are::
For more information see the article on omics datatypes and standards Coverage:
|
Application / IT-infrastructure
Ways of data exchange | Preference for federated processing (in proximity to storage by the data holder) due to
There is a special standardized protocol “htsget” that can provide specific access to the necessary parts of genomic data, so that as little copying as possible is required. This has not yet been worked out for other omics data. A European infrastructure for exchanging human genome data is being built in the GDI project in which Health-RI participates on behalf of the Netherlands. |
Implementation | There are many tools used for Omics data, including:
|
Security
Assessment of anonymization | For this type of data, anonymizing individual data is impossible: it concerns not only a person but also his immediate family. It is easier to get enough context information to identify a subject. Anonymization through aggregation is possible: when it is indicated for a (sufficiently large) group of data subjects which genetic variants have been observed in the group, it is no longer possible to trace them back to individuals. The “beacon” protocol is a world standard for interrogating genome data. For V1 of this protocol it has also been determined that when more than approximately 200 questions are asked, it becomes possible to reidentify a subject based on the answers. Such an analysis is not yet formally known for V2 of the protocol, but it is already clear that the number of required queries will be significantly lower, perhaps around 20. |
Additional privacy measures | For the purpose of identification, authentication and authorization, the use of passports and visas is recommended, which must be supported by LS AAI and (within the Netherlands) SRAM. Furthermore, the systems for international exchange of genetic data are focused on the use of encryption: data is stored encrypted where possible, and granting access mainly consists of temporarily providing access to a decryption key that specifically contains only the necessary parts of the data can be decrypted. |