Storyline: Generating research data

This article describes the storyline of generating research data.

This storyline describes how a data source generates new data points during a study. After requesting and configuring a secure processing environment, with the necessary software (for example Castor or XNAT) made available, the data user can start data generation. Examples of data generation include having citizens complete questionnaires, having images annotated by experts, or transferring data from an EPD to an EDC. After completion of the data generation, the newly generated data can be made available within the Health-RI ecosystem and archived.

See https://health-ri.atlassian.net/wiki/spaces/HA/pages/164497905 for the collaboration diagram of this storyline.

Comments 

  • The underlying idea of this storyline is that researchers can more easily make research data directly available within the Health-RI ecosystem, by generating new data within the ecosystem.

  • This article was already described in the previous version of the wiki. In this version, business objects have been added, with the intention of making the steps in the storyline clearer.

Precondition

  • The researcher has an approved research proposal that describes, among other things, which data may be generated and how.

Proces Model

  1. The researcher in the role of data producer selects the (Data Capture) tools specified in the research proposal from the tool supplier or brings his/her own tools to

    1. draw up a data management plan for reusable research results

    2. to establish the citizen's say (informed consent / e-consent / opt-out).

    3. generate data

    4. manage and curate data.

  2. The Local Data Access Committee of the relevant data holder indicates under which conditions of use data may be recorded at the data source for multiple use.

  3. The data source records these terms of use in a data management plan and defines which data will be recorded and how, using code tables, metadata dictionaries and metadata templates prescribed by the data governance committee.

  4. The data source configures the selected tool to create a workable environment.

  5. If new data is going to be generated on existing data (for example annotating images) or is a supplement to existing data (for example setting up a survey based on previously done blood tests)

    1. existing (care) data extracted from a care system

    2. requested the existing research dataset via the request data process

  6. If existing data is used, the data source collects and curates the existing data to make it suitable for enrichment ('research dataset to be enriched').

  7. The data source uses the enrichment tool to configure the data enrichment.

  8. The data source starts by generating new data points, through measurements, analyses, questionnaires and observations. (These data, in combination with optional input, for example healthcare dataset or requested research dataset, lead to an enriched dataset).

  9. When the enrichment dataset has been generated, the data user receives the generated data to curate and convert to a new research dataset (generated research dataset).

  10. The data user makes the generated research dataset available within the Health-RI ecosystem through the process of making data available to the desired data holder.

Postcondition

  • There is a new (generated) research dataset that can be made available for multiple use.

  • The conditions of use under which the data may be recorded for multiple use are known

 

image-20240606-085047.png
Proces diagram “Generating research data“

 

The following business objects are used in this storyline:

  • Research proposal: A research proposal is a structured document that provides the basis for planning and conducting a study

  • Data Capture Tooling: Tooling for generating research data including recording ownership. Typical examples are EDC (Castor, Redcap, LibleClinica, XNAT, Omero). An EPD, for example a questionnaire module, can also be part of this.

  • Terms of Use: Terms of Use are the rules and regulations that govern the relationship between a user and the provider of a product. They describe the rights, obligations and responsibilities of both the user and the provider regarding the use of the product.

  • Data management plan: A data management plan (DMP) is a formal document developed at the beginning of a research project that describes all aspects of data management, both during and after the project.

  • Code table: A code table is a collection of codes used to represent certain information, often in a structured form.

  • Standard: A standard is an established standard or specification that is used as a reference point for evaluating or producing something.

  • Metadata dictionary: A metadata dictionary is a structured collection of metadata elements and their definitions used to manage and describe metadata within a specific domain or context.

  • Inclusion and exclusion criteria: inclusion and exclusion criteria are used to perform a data extraction from a patient file or population database of which the participant list serves as a basis (inclusion list) for generating research data.

  • Inclusion list: list of citizens/patients/participants who meet (clinical) inclusion and exclusion criteria.

  • Citizen participation: the citizen (participant in research) can indicate that he or she agrees with or does not object to the collection and use of data for scientific research

  • Healthcare data: existing patient data in clinical care systems, such as EPD, clinical PACS.

  • Healthcare dataset: explicitly compiled/selected set of data extracted from healthcare systems, based on the inclusion list. (For prospective scenarios (implicit selection) this is a dynamic/growing set, but for the storyline, a retrospective scenario is assumed.)

  • Healthcare system: clinical system used to provide care to patients.

    • Enrichment Configuration:

    • Registration protocol

    • Questionnaire

    • Analysis protocol

    • Follow up

    • AI model

  • Enrichment dataset: the data resulting from the enrichment (based on an enrichment configuration). This data is - if applicable - combined with the data on which the enrichment was carried out (for example a healthcare dataset or requested research dataset).

  • Research dataset: the research dataset has the following variants

    • requested (optional): research dataset that has been requested and is used as a source for enrichment within this process

    • to enrich: research dataset that is enriched within this process

    • generated : resultant set of data (as in ready for release for this storyline), compiled explicitly for the research project

Â