Health-RI wiki v4.0 -> consultatie (open tot 03-12-2024)
Metadata regarding terms of use, authentication/authorization and ELSI aspects
This article describes a proposal for 'ELSI metadata', which means: metadata about the conditions and restrictions that apply to reuse of data, ELSI elements that are important in a data request and ELSI elements that are important when approving a data request, focussing on authentication and authorization.
Metadata for authorization and ELSI aspects are important in different parts of the Health-RI infrastructure. We looked at the following user stories:
Finding datasets
Request datasets
Analyze datasets
Publish datasets in the catalog
It is important to take into account legal obligations at both national and European level. The following laws and regulations were mainly examined: GDPR, WMO, EHDS and the Code of Conduct for Health Research. In addition, the NFU template MTA/DTA and consent forms of various hospitals were examined.
We propose the following metadata elements and, where possible, give a suggestion for the appropriate ontology systems.
Catalogue
It is important to indicate in the catalogue what the conditions for use are so that a user can assess at an early stage whether a request for this dataset is promising. In addition, it is important for the process to have information about who can decide on an application.
Conditions for use are based on laws and regulations (UAVG, Data Governance Act, Wet hergebruik overheidsinformatie, databankenwet, AVG/GDPR, EHDS, Data Act, etc.) and conditions set on the basis of verification of intellectual property (e.g. copyright and database rights) and the rights of the citizen (consent (opt-in) and/or objection (opt-out)). Many of these elements (in bold) are already recorded in HealthDCAT-AP and we only have limited space for this to give our own interpretation.
Metadata-element | Explanation | Recommended Coding | Examples/comments |
dct:Publisher Publisher | Who is the provider of the data? | foaf:Agent | The hospital, this will usually also be the data controller. |
dpv:hasLegalBasis Legal basis | The category of legal basis on the basis of which the data was collected (as in the GDPR art 6.1 a to f). | dpv:LegalBasis
| For example, consent (e.g. in the case of scientific research) or legal obligation (e.g. in the case of quality registers). |
dcatap:applicableLegislation dpv:hasApplicableLaw Legal basis | If the basis (see above) is a legal obligation, this can be used to indicate on the basis of which legislation. | Persistent URL for the law | Alternative is dpv:hasApplicableLaw but dcatap:applicableLegislation is part of HealthDCAT-AP. |
dpv:hasJurisdiction Jurisdiction | Jurisdiction under which the dataset falls. | dpv:Location | Within the Netherlands, this will probably be the Netherlands and the EU. |
dpv:hasPersonalData Contains personal data | Data categories (which can be linked to a person) that this dataset contains. | dpv:PersonalData | Location, SexualOrientation, MedicalHistory |
dct:accessRights Access Rights | Indication of whether a dataset is open or not. If a dataset is not publicly accessible, the fields below can be used to further specify the conditions or restrictions. | dct:RightsStatement
| Restricted or sensitive (in case of identifying data). In that case, use the fields below to specify conditions/restrictions. Public (in case of anonymous data). In DCAT-AP Health limited to the options from the EU-Voc access rights vocabulary.
|
Conditions of use | The main conditions encoded in a machine-readable format. Can be used to filter or to ask additional questions during the application. For example, whether costs will be charged. | DUO, DUC/CCE of ODRL
| DUO or DUC/CCE can also serve as syntactic sugar and be converted to ODRL underwater, predicate has yet to be defined. |
Link to terms and conditions document | Think of MTA, Consent forms. | Persistent URL | Predicate has yet to be defined. Possibly use foaf:Page? |
Contains intellectual property | Does the dataset contain intellectual property that needs to be protected?
| Boolean
| This will almost always be no , but is defined in the EHDS, because measures may be taken to protect intellectual property. This may be important later in the application process. Predicate has yet to be defined. |
healthdcatap:hdab Health Data Access Body | Health Data Access Body that can give approval for secondary use under the terms of the EHDS. | foaf:Agent
| This field will only be implemented once HDAB is in place. Until then, the data controller will be the contact person for assessment. |
Linking keys | Keys that are available to link with other datasets. |
| These may be available in addition to the dataset and may not be delivered. E.g. direct or indirect identifiers, such as name and address details. Predicate and range for object still need to be defined. |
Request procedure
For the request procedure, we have tried to define a logical workflow in which things are asked (not published here because it is outdated currently). Based on this preliminary workflow, possible metadata elements were examined.
Metadata-element | Explanation | Recommended Coding |
Purpose limitation | Purpose for which the data is requested.
| DPV. Categories from EHDS 34, GDPR 5.1 and in addition free text for purposes outside of that. On the basis of the purpose limitation, it can be assessed whether EHDS may be applicable. |
Objectives/research question | Description of the purpose/research question for which the data is requested. | Text or a document |
Applicant | Data of the person (see metadata description below). |
|
Request from organization or individual | It is relevant to know who is authorized to sign. | Boolean value |
Organization | If the application is within an organization by virtue of the requestor's role. | ROR identifier, foaf:Agent |
Position of requestor | Function of the person within the organization (see metadata description below). |
|
Personal data of the contact person to control authority of the requestor | Data of the person (see metadata description below). |
|
Principal Investigator | Data of the person (see metadata description below). |
|
dpv:hasImpactAssessment DPIA | Data Privacy Impact Analysis, only necessary if the requested dataset contains personal data within the meaning of the GDPR (requested in EHDS). | dpv:PIA |
Data Management plan | Data management plan (requested in EHDS). |
|
Desired format | Requested by the EHDS. |
|
Specification | Requested by the EHDS. Main number of people / distribution. Variables. |
|
Financial data | If costs are charged (to be specified in the catalogue under use conditions) questions. |
|
Authentication and authorization metadata
It is important to unambiguously determine who is requesting the data and who has access to data. There are several options for establishing identity. What applies depends on the degree of certainty that is required and is a consideration that must be made by Health-RI and the data holders involved. The table below defines a number of metadata fields that can be used to maintain control over this. We emphasize that this list is not a data model. When modeling, one should take into account that people may have different roles in different requests and may authenticate themselves in different ways depending on the role.
Authenticate metadata
Metadata-element | Explanation | Recommended Coding |
Full name | Full name of the person in the usual spelling. Note: Names are not unique and may change over time. |
|
Identifier | Unique identifier of the person within the system. |
|
Role | Role of the person within the process, e.g. Requester, PI, Data Manager, Privacy officer. EHDS requires that the people with access to the data are listed in the application and data permit. | DPV may need extensions |
E-mail address | Contact e-mail address of the person. This must be a personal address, not a group address. | FOAF or vCard, DCAT uses both in different places |
Telephone number | Contact phone number, preferably a direct number. |
|
Authentication Method | How the person logged in for this session (e.g., username/password, federated login). |
|
Time of login | The moment of login. | ISO 8601 |
Method of establishing identity | Method by which the identity is established, e.g. via online methods such as federated login, or offline methods such as passport checks. | To do this, a code list must be established. |
Authorization/data permit
If a person is authorized to use the requested data, this can be recorded in a data permit (EHDS). The EHDS describes in article 46.6 what must be laid down in this. The proposal is to always use this structure mutatis mutandis.
Metadata-element | Explanation | Recommended Coding |
Categories
| The EHDS has established a list of categories of health data. |
|
Specification | Specification of the data. Selection criteria, variables, and number of data points. Possibly also distribution. |
|
Format | Format in which the data is made available. |
|
Pseudonymized data? | Is it identifying data or pseudonymized or anonymized data? | Antoinette Vlieger proposed to start from four categories:
This concerns the context in which the data is used. Data can be identifying in the context of a hospital, but not identifying in the context of research because the linking key is not accessible there. |
Description of the purpose | A description of the purpose for which the data was requested can be taken from the application itself. |
|
Authorized Persons | EHDS requires a list of all authorized persons who have access to the secure processing environment, but the PI is a good start. |
|
Length of time that the data permit is valid |
| ISO start/end date or duration from agreement |
Information about the tools available in the Secure Processing Environment and their characteristics | EHDS asks for details of the environment, but in the first instance it seems sufficient to indicate which SPE is used. Possibly refer to a description of the SPE elsewhere. Only if additional tools are installed for the user to include this explicitly. |
|
Cost |
|
|
Additional terms and conditions | Agreed conditions, for example from an MTA/DTA or DAA. |
|
Link to MTA/DTA/DAA | Link to the documents. |
|
Date of agreement | Date on which the permit is approved by all parties. |
|
Resources
Code of Conduct for Health Research: https://elsi.health-ri.nl/sites/elsi/files/2022-01/Gedragscode_Gezondheidsonderzoek_2022.pdf
NFU Template MTA/DTA https://elsi.health-ri.nl/sites/elsi/files/2020-07/20.30817%20OenO%20Template%20-%20Material%20and%20associated%20Data%20Transfer%20Agreement.pdf
Template Clinical Trial Agreement
DCAT-AP Health draft (template: GitHub · Build and ship software on a single, collaborative platform )
Code Systems:
Data Use Ontology (DUO) The Data Use Ontology to streamline responsible access to human biomedical datasets
Digital Use Conditions https://zenodo.org/records/8200044 (pre-print)
Common Conditions of Use Elements https://zenodo.org/records/8200079 (pre-print)
ROR https://ror.org
DPV LEGAL-EU https://w3c.github.io/dpv/legal/eu/
“Enhancing Data Use Ontology (DUO) for health-data sharing by extending it with ODRL and DPV” https://doi.org/10.3233/SW-243583
DCAT-AP
EU-Vocabularies: EU Vocabularies - Home - EU Vocabularies - Publications Office of the EU
Legislation, a.o.
AVG & UAVG
EHDS
WMO
WGBO