Requirements for a secure processing environment

DATE: 23-08-2024 STATUS: ADOPTED

This article contains the requirements for a secure processing environment.

Assumptions

  • Considering the complex landscape of needs and solutions, different sets of requirements are being developed. Every secure processing environment offered within the Health-RI ecosystem must meet the minimum requirements and a dataset can only be processed in a processing environment that meets the security requirements in accordance with the confidentiality level of the data.

  • There is no one size fits all solution that meets all use cases, which is why it is important to provide insight into the existing solutions and the (security) requirements they meet

  • The various solutions can be offered via the Health-RI Hub or one of the Health-RI nodes, as long as it is clear who has the processing responsibility and control over access.

Framework of requirements

This program of requirements sets requirements for safe processing environments at three levels:

  • Minimum requirements: Every central analysis environment used within the Health-RI ecosystem must meet a set of minimum requirements. This set forms the framework within which various security and user requirements are set for a secure processing environment, depending on the use case.

  • Requirements for data security and access policy: Depending on the sensitivity of the data processed for a use case, more or less strict requirements may be imposed on data security and access policy. These requirements are derived from the terms of use associated with the dataset and therefore come from the data supplier.

  • Functionality requirements: Different use cases place different requirements on secure processing environments, for example in terms of computing capacity and storage capacity. These requirements come from the data user. This framework of requirements should support data users in selecting the most suitable secure processing environment for their use case.

Minimal requirements

The minimum requirements that a central analysis environment must meet are shown in Table 1 (this overview is not yet complete and will be supplemented in a next version).

ID 

Description of the minimum requirement

1 

Control over access to the SPE work environment must be able to be carried out by the data holder or a trusted (third) party

2 

SPE must be interoperable with the central Health-RI data exchange solution.

3 

SPE must offer a facility for removing the work environment and/or withdrawing access to the data when the permission to use the data has ended.

4 

SPE must be able to use the agreed generic identification, authentication and authorization solution within the Health-RI ecosystem.

5 

SPE must offer a facility for role-based authorization in order to implement the agreements on access to data as described in the “data access agreement”.

6 

SPE must offer different roles for users to guarantee data access without the controller losing control over the data.

7 

SPE must log and monitor access to the data sufficiently to provide the required information regarding access to and use of the data to the data subjects.

8 

SPE must be able to provide an estimate in advance of the computation and storage costs based on different configurations.

9 

SPE must provide functionality for monitoring and controlling usage costs during the analysis process.

10 

SPE must provide functionality for publishing and archiving results, as well as data and analysis scripts necessary to ensure reproducibility of results. SPE provides a process where each export is controlled by the operating environment owner.

11 

SPE provides documentation on the security level of the working environment and this security level is guaranteed by the supplier.

12 

SPE provides documentation on where the data is stored.

13 

SPE provides documentation about the offered computing and storage capacity.

14 

SPE offers an onboarding process for new users as well as one of the following options:

  • Standard configurations with documentation on which requirements these configurations meet

  • documentation on how to configure the SPE for technical and non-technical users

Table 1: Draft minimum requirements for central analysis processing environment 

Requirements for data security and access policy (data holder)

Depending on the confidentiality of the data that is analyzed in a secure processing environment, different requirements may be imposed on data security and access policies. The confidentiality level is determined based on the risk of identification of data subjects (see reference card for researchers - LCRDM). A high risk of identification requires more data protection measures than when this risk is low. Completely anonymous data falls outside the GDPR and does not require additional data protection measures. However, in practice it is very rare for health data to be completely anonymous.

Appropriate data protection measures are defined based on the Five Safes framework. The Five Safes framework views data management decisions as problem solving in five dimensions:

  1. Safe projects: Is the use of the data appropriate?

  2. Safe people: Can the users be trusted to use it appropriately?

  3. Secure settings: Does the access facility restrict unauthorized use?

  4. Secure data: Is there a risk of disclosure in the data itself?

  5. Safe output: Are the statistical results non-revealing?

As a rule, these requirements for a secure processing environment are set by the data holder. The sensitivity level of the data is recorded in the metadata. To ensure that an analysis environment is suitable for processing a dataset with a set sensitivity level, the following preconditions must be met:

  • The confidentiality levels for datasets are clearly defined to which type of data they relate

  • The requirements for processing environments are clearly defined per confidentiality level

  • The requirements imposed on processing environments for each confidentiality level meet the requirements set by the GDPR for processing data with this confidentiality level

  • Each supplier of processing environments offers documentation that clearly describes which set of security requirements (for data with which confidentiality level) the processing environment meets

The first three of the above conditions will be part of the program of requirements for safe processing environments within Health-RI and will be elaborated as such by the analysis working group.

The program of requirements to be developed will include that not only the processing environment, but the entire chain of data delivery to this environment and export/archiving from this environment must meet the stated security level.

Requirements for functionality (data user)

Different use cases place different demands on the functionality and user-friendliness of a processing environment. This concerns, for example, requirements in the field of storage and computing capacity, analysis software offered, options to install or have your own analysis software installed and user-friendliness.

Depending on the confidentiality of the data that needs to be processed, not all requirements of the data user can always be met. Especially when the data user requires a more open environment, but works with highly confidential data that can only be processed in a closed environment.

The Dutch Federation of University Medical Centers (NFU) has drawn up profiles of researchers in the field of health and care in the Data4LifeSciences program (Figure 1). These personas are a combination of area of interest, role, processes, data, applications and typical infrastructure in use. The profiles are related to the degree of support by IT functionalities and services and the need for computing and storage capacity and flexibility. It is a guideline for classifying existing secure processing environments with the aim of guiding researchers and support staff in the selection of the appropriate secure processing environment for a project.

image-20240314-091632.png
Figure 1: Researchers profiles in the health domain

For the program of requirements, the data user requirements for functionality and user-friendliness will be further inventoried.

Cost

The costs of a safe processing environment can be a precondition for use. However, it is not a functional requirement that makes an environment more or less suitable for a particular use case. To make a good decision, there are two requirements from the data user:

  • Transparent cost structure: The data user must be able to estimate the expected costs before the project starts

  • Accessible payment model: Settlement must also be feasible for small projects

These requirements are included in the minimum requirements for any secure processing environment.

Classification of secure processing environments

Existing secure processing environments will be classified based on data security requirements and functionality requirements.

This classification provides:

  • data users: to gain insight into the requirements that different central analysis environments meet and thus facilitates them in choosing the most suitable central analysis processing environment

  • data holders: to have clarity about and confidence in the security level of each central analysis environment offered and facilitates them in including these requirements in the user conditions of datasets

Based on the classification of secure processing environments, support services for data users can be developed, such as a decision tree to arrive at the most suitable processing environment for a project. Furthermore, the classification provides insight into how existing solutions fit user requirements. If solutions that are needed are missing, this is a basis for discussions with providers of secure processing environments to investigate what options are available to meet this need.