Metroline Step: Design eCRF (data collection)

status: Future work

Short description 

Traditionally, clinical data was captured using paper case report forms (CRFs) and manually digitised afterwards. Nowadays data is digitally collected in Electronic Data Capture systems (EDCs) in electronic Case Report Forms (eCRFS). Many EDCs exist, such as OpenClinica / LibreClinica, Castor EDC and REDCap and the majority of the Dutch UMCs have a site license for such tools. EDCs offer a variety of functionality, such as data entry validation, import/export functionality and monitoring [hri_edc].  

Quality of the eCRF is influenced by design aspects such as visibility of data elements, dependencies, validations, etc.

To make clinical data semantically interoperable, the eCRF should be built with interoperability in mind. Instead of using your own local definitions for you data items or defining definitions from scratch, reuse of existing definitions should be considered. For this purpose, many initiatives exist that aim to provide templates, such as CDASH, provided by CDISC and Common Data Elements (CDEs) offered by, for example, the National Institute of Health. Furthermore, data definitions (codebooks) may be available for reuse in online solutions such as ART-DECOR or OpenEHR [iCRF]. 

 

[hri_edc] https://www.health-ri.nl/electronic-data-capture-edc-or-electronic-case-record-form-ecrf-systems 

[iCRF] https://f1000research.com/articles/9-81  

 

------------------- 

The eCRF was designed to collect data for the CDEs (described in step 2) in the Castor EDC system [5]. Several dependencies, e.g. only show ‘Date of death’ when the patient is deceased, and validations, e.g. validate whether the entered Online Mendelian Inheritance in Man (OMIM) genetic disorder code follows the OMIM standard, were included in order to collect high-quality data (the eCRF questions can be found in [6]). To this end, we mostly worked with closed questions and/or drop-down menus and prevented entering free text as much as possible. An example from the eCRF is shown in Figure S1A. The eCRF template containing the CDEs and the ontologies to annotate them (see step 5) was described in a codebook. This codebook was made openly available in ART-DECOR, a platform from Nictiz, the Dutch competence centre for electronic exchange of health and care information [7], and can be directly implemented in the Castor EDC system or other EDC systems using the openly available iCRF Generator tool [8]. 

[De novo]

Castor EDC [33], the vendor of the EDC system used in our project, developed the technology to facilitate the de novo FAIRification of the VASCA registry (phase ii in Fig. 1). The eCRF designed for the CDEs, including the technology to translate to machine-readable format, are reusable (Additional file 1: Supplementary Methods—steps 6 and 7). The eCRF can be copied directly to a new database within the EDC system, to initiate a new ERN registry. Some ERN-specific adaptations may be necessary. For instance, diagnosis is registered using a drop-down menu focusing on vascular anomalies and should therefore be adjusted for an ERN with a different focus. The ontologies used in the CDE semantic data model are not limited to an area of disease. The developed (eCRF to RDF) data transformation application (Additional file 1: Supplementary Methods—step 6 onwards; [17]) is generic and can be reused by other registries and clinical trials, ensuring that new FAIRification projects can easily be set up within the EDC system. Likewise, other registries in the EDC system can reuse the FAIR Data Point structure and query functionalities developed for the VASCA registry (Additional file 1: Supplementary Methods—steps 8, 12, 13, 14, and 15). Furthermore, we have made our eCRF interoperable and reusable, as the codebook describing the eCRF templates containing the CDEs and the ontologies to annotate them is openly available in ART-DECOR [34]. Via the openly available iCRF Generator tool [35], the codebook can be directly implemented in other EDC systems such as OpenClinica and REDcap. 

[De novo supp]

Step 4 - Design the eCRF in the EDC system 

The eCRF was designed to collect data for the CDEs (described in step 2) in the Castor EDC system [5]. Several dependencies, e.g. only show ‘Date of death’ when the patient is deceased, and validations, e.g. validate whether the entered Online Mendelian Inheritance in Man (OMIM) genetic disorder code follows the OMIM standard, were included in order to collect high-quality data (the eCRF questions can be found in [6]). To this end, we mostly worked with closed questions and/or drop-down menus and prevented entering free text as much as possible. An example from the eCRF is shown in Figure S1A. The eCRF template containing the CDEs and the ontologies to annotate them (see step 5) was described in a codebook. This codebook was made openly available in ART-DECOR, a platform from Nictiz, the Dutch competence centre for electronic exchange of health and care information [7], and can be directly implemented in the Castor EDC system or other EDC systems using the openly available iCRF Generator tool [8]. 

Why is this step important 

The quality of the eCRF will greatly influence the quality of the collected data. Furthermore, by using reusing existing definitions to build these eCRFs, the collected data will be more interoperable. <say something about semantic modelling?>

How to 

If common data elements (CDEs) are used (Step X), these should be used as the basis for your eCRFs, since the items in the CDEs have been unambiguously defined. Furthermore, if the CDEs have been annotated with ontologies, these should be used in the eCRF. 

 

To design an eCRF, a number of things must be kept in mind: 

  • What are the questions and possible answers? 

  • Can questions and answers be annotated with ontologies? 

  • What field types do you want to use? 

  • How should data be validated during data entry? 

  • Is there dependency between fields and are fields hidden initially? 

  • Are there calculations? 

Source?

 

[my own thoughts… probably more how to related] 

When designing the eCRF itself, there are numerous things to keep in mind: 

  • An eCRF typically handles one topic, e.g “Anamnesis”, “EQ-5D”, etc.  

  • For each item in the eCRF carefully consider practical useability 

  • For example, if an item has 5 options, a radio button is great. If the item has 100 options, a dropdown is more suitable 

  • Try to avoid multi-select items, as these tend to give many issues 

  • Add validations to fields to improve data quality 

  • Add conditional visibility to your fields 

  • E.g. if you have a question concerning the menstruation cycle, you could make this invisible for men, if you’re also collecting gender data 

  • If someone else will be entering the data, consider letting that person test the eCRFs.  

Furthermore, to make clinical data semantically interoperable, the eCRF should be built with interoperability in mind. Assuming common data elements (CDEs) are used (Step 12), these should be used as the basis for your eCRFs, since the items in the CDEs have been unambiguously defined. Furthermore, if the CDEs have been annotated with ontologies, these should be used in the eCRF. For example, if your CDE has a field with an option “Laser eye surgery” with SNOMED CT code 608849003, you should also use this in your eCRF.  

If the definitions you intend to use in ART-DECOR or OpenEHR, one way to easily create a basis for your CRFs is by using the iCRF Generator[source]. This freely available Java tool allows you to reuse online codebooks and generate eCRFs for a variety of EDCs, such as REDCap, Castor and OpenClinica. The generated CRFs can be edited with a suitable editor, or imported in the EDC where they can be further improved if necessary. 

 

When setting up your EDC, it is important to first think about the events/visits the participants will have and then think about the eCRFs that will be involved in each of these events.  

Depending on your project, data collection could take place in one or more EDCs.

 

The How to section should:

  • be split into easy to follow steps;

    • Step 1

    • Step 2

    • etc.

  • help the reader to complete the step;

  • aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;

  • be a general, widely applicable approach;

  • if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;

  • aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;

  • contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;

  • contain custom recipes/best-practices written by/together with experts from the field if necessary. 

Expertise requirements for this step 

This section could describe the expertise required. Perhaps the Build Your Team step could then be an aggregation of all the “Expertise requirements for this step” steps that someone needs to fulfil his/her FAIRification goals.  

 

EDC system specialist: Individual who has experience with and knowledge of Electronic Data Capture (EDC) systems, such as Castor EDC, REDCAP or OpenClinica. They are in charge of setting up user access, data validation checks and electronic case report forms in the EDC system. They offer technical help to researchers and ensure data integrity and regulatory compliance.

 

Practical examples from the community 

This section should show the step applied in a real project. Links to demonstrator projects. 

Training

Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.