Current developments
- 1 Introduction
- 2 29 October 2024
- 3 HDAB-NL
- 4 GDI
- 5 October 16 2024
- 5.1 Onboarding process
- 5.2 Catalogue
- 6 October 3 2024
- 6.1 Onboarding process
- 6.2 Catalogue
- 7 September 4, 2024
- 7.1 Onboarding process
- 7.2 Catalogue
- 7.3 HDAB-NL
- 8 August 20, 2024
- 8.1 Onboarding process
- 8.2 Catalogue
- 9 August 6, 2024
- 9.1 Onboarding process
- 9.2 Catalogue
- 9.3 HDAB
- 10 July 24, 2024
- 10.1 Onboarding process
- 10.2 Catalogue
- 10.3 Request application
- 11 July 11, 2024
- 11.1 Onboarding process
- 11.2 Catalogue
- 11.3 Request application
- 11.4 HDAB
- 12 June 27, 2024
- 12.1 Onboarding process
- 12.2 Catalogue
- 12.3 HDAB
- 13 June 11, 2024
- 13.1 Onboarding process
- 13.2 Catalogue
- 13.3 HDAB
Introduction
On this page you can find the latest developements and issues our team is currently working on. For known issues and the recommended solutions please visit Known issues. For further information please contact HRI service desk at servicedesk@health-ri.nl and we will assist you.
29 October 2024
Update on ongoing work:
Including HealthDCAT-AP mandatory fields into Health-RI metadata model v2
All mandatory fields from Health-DCAT AP have been included in the version 2 Health-RI model that was delivered by the data team to the implemnetation team. The implementation team has been investigating the impact of the changes .
This model now has to be included into the catalogue, but for some properties we still have some dependencies with the CKAN community, e.g. Implementation of Dataset Series into CKAN. (CKAN is the technical backend of our Catalogue).
SHACLS for this v2 model are being created and need to be integrated into the FDP and the CKAN backend of the catalogue. Also some changes to the front-end will likely follow from this.
The CKAN community still has to respond on some questions from Health-RI side for including HealthDCAT as a profile in CKAN.
Authentication method for request application
Investigation is ongoing to allow a user to register with a CO (Collaborative Organization) within SRAM, so that the user can have a verified identity within SRAM, as needed to be able to request access to datasets.
Request forms configuration
Still investigating the configurability of the request forms. They should be easily updated with new or changed questions. And how dynamic the form can be. So that for a given type of datasets the questions are different. And that based on certain answers the follow-up questions are determined. A common form builder should support the needed functionalities, we expect, but this needs to be verified. Our Main objective is to be able to do the configuration of the request forms ourselves, and not be dependent on external parties for changes to the questions. After this investigation, the proposal is to move to the Microsoft Customer Voice form generator, and by that conclude the current investigation. This weill need to be linked to a CRM that can store the results.
FDP improvement options in architecture
The team made a drawing of a probable improved architecture for the FDP (FAIR Data Point). Generic central functions and specific local functions can be split up more, and FDP bridges (scripts) can be used to get metadata across in a way that lightens the burden at the data holder side. Our aim is to design this in Q4 of 2024 and use Q1 of 2025 to implement new solutions.
Improvements of the catalogue
Missing fields in the DCAT extension of CKAN - picking it up with the CKAN community.
When using the Dutch version of the catalogue, not all lables were resolved. This is because in a service we use, Pearl.org, not for every term a Dutch word is available. In that case, we want a fall back mechanism to take the English word if that is available. When onboarding, it is needed to tag the metadata provided with a language tag, so that the catalogue knows which words to show based on the combination of language tags in the metadata and the language set by the user in the catalogue.
Identification linked to request dashboard
Togehter with GAC (provider of our request tool MVP) the team worked on connecting Keycloak (with SRAM connected) to the Health-RI Dashboard for the Request Application. Information shown here will be provided by the CRM backend in realtime, so that it will be able to display the status of requests to the user.
HDAB-NL
The Healthdata@EU pilot presented their second release. In the coming year the code will become available and we will make sure we investigate the benefits from that code and if we should use (parts of) that code directly in our infrastructure.
Release 2 includes several new features to enhance the central platform:
User Identification and Authentication: Users can now log in using their EU login credentials, ensuring secure and clear identification.
User Authorization: A new feature allows the creation of different user roles, with easy tools to manage and update user access.
Data Permit Application Status: The system now supports better tracking and management of application statuses between Contact Points and the EU Dataset Catalogue.
Health DCAT-AP Validation according to the constraint file: This service checks that all incoming datasets meet the Health DCAT-AP standard, ensuring consistency and accuracy.
See:
GDI
GDI National Node deployment
EGA rare diseases dataset is used as a use case to make metadata via the beacon information.
A clinical dataset that comes as a test set with the beacon reference implementation will be used to fill the metadata for the dataset. This will transfer the BFF format provided by the beacon, to RDT format that the FDP needs.
The beacon was installed, (reference impl and production impl were both implemented before, but the production beacon version is now used by everyone, as decided in the pressure cooker session of which a report is available).
In the session it was tried to load the clinical dataset, but the scripts were not sufficient and refused to process certain files with different error messages for each try. Seems to be a script problem and not a data problem. Fixing the script would be too time consuming. Via an expert an alternative way will be tried to add the genomevariation file. This will be tried for the cancer data set that is the actual use case.
Some more info and questions to investigate within GDI:
Beacon has an API to get the metadata for the fair datapoint.
The beacon should be queryable from the catalogue. A beacon is a protocol in which you can do queries and get aggregated data back.
Source of the catalogue is a FDP and we now want to make a bridge between beacon and FDP.
So how will the catalogue know that there is a beacon and not a dataset?
A beacon holds many datasets, and may not give you answers on a specific one when you are referred from a dataset to that beacon as a dataservice for it.
October 16 2024
Onboarding process
Netherlands Parkinson Cohort has been approved as a use case to map cohort data an onboard into the catalogue (2025).
During the 2024 Health-RI conference the HRI team held an onboarding session. We are currently processing the feedback.
Metadata
The team is currently incorporating feedback on the domain metadata development process and co-developing with the Imaging and Omics working groups on their respective petals
The metadata mapping page has been updated to contain an example of data compliant with our model.
Catalogue
Several issues in the national catalogue were solved so that we could have the new front-end ready before the Health-RI Conference of 10 October.
There were a number of issues with the COVID portal. We’ll be contacting the stakeholders for Covid Portal to see if there is a way that this info will come via other routes to the national catalogue, that could replace the current route of information.
The ontology that ZonMw projects use was offline due to hacked systems that we depend on. The person who was managing this before cannot be reached. We do not have influence on this as Health-RI but will need to see if they come online again, or consider alternative ways to support the ontology. To choose an alternative way would however mean that all the forms need to be adjusted to refer to something new. Health-RI does not have access to all of those forms, so would need to reach out to PI’s to change their forms. Therefore were prefer to first see if this system would come online again.
Projects did not show up that people tried to add to the covid portal. It turned out that new templates have been created that we were not aware of. New templates will never be automatically shown on the covid portal, as it has been designed to show information from forms for a specific set of template-IDs from Cedar. Updates to templates are fine, but new templates will never show up out of the box and weren’t envisioned in the original process.
October 3 2024
Onboarding process
On domain specific metadata: 4-weekly meetings with metadata experts from working groups are planned until end of the year. And the first pages/steps of the process are ready to be reviewed.
The intake to onboard five new projects as started.
Three pages of the FAIR Metroline are up for external review (until October 18th): Define FAIRification objectives , Have a FAIR data steward on board , Pre-FAIR assessment
The core / generic health metadata draft for plateau 2 is in review since this week, until October 18th. Everything is findable in the develop branch on Github, and input/feedback is also collected there.
New scenario for onboarding of a biobanks to National Health Data Catalogue is available: Scenario UMCG1.1: Make Biobank Findable in HRI Catalogue - Core
Catalogue
The new front-end for our catalogue is live! See https://catalogus.healthdata.nl/
There are still some things we want to change, but we are pleased with the general look and feel and hope you will be, too!
Detailed changes in the National Health Data Catalogue and other efforts of the last sprint:
“Last updated” field till now reflected the date the latest harvest took place. Instead, now it reflexts the last moment the metadata was updated according to what the data holder submits.
Filters were shown that did not contain any values. Now filters without values will not be shown anymore.
A placeholders website was connected to the Keycloak, to test using SRAM via Keycloak as a login method.
All core metadata fields have been made present in the FDP which can provide an example for our nodes. Not all of those mandatory fields are yet visible in the front-end of the catalogue. This will be added later. The example data used came from the metadata GitHub.
The DCAT extension of the CKAN catalogue backend was upgraded to a 2.0 version, which now supports DCAT AP v3. It was published on GitHub so that anyone can benefit from these efforts. This entailed:
Support for multiple contacts, creators, and publishers.
Addition of missing fields
Cleaning up some technical debt.
Support for DCAT APv3 means that metadata compliant to DCAT-NL will also be a good fit, however we do not validate on the stricter conditions set by DCAT-NL, but allow all metadata that complies to DCAT APv3.
GDI
GDI national node deployment: A synthetic dataset about rare diseased from EGA, was downloaded and the intention for follow-up is to make it findable on a catalogue.
September 4, 2024
Onboarding process
Improvements in the SOP for TopDesk Onboarding Support and Github release managment.
Ongoing help for onboarding projects via Walk-in hour and 1-on-1 talks.
Metadata requirements for core and generic health metadatada from HRI Funders have been processed.
FDP connection tested with UMCG biobank.
GDI: Work on the minimal dataset is continued, and now the classes Patient, Disease History, Submitter and Sample have been almost finalized. The draft of the Treatment class will be reviewed by the working group. A first draft of a diagram is in progress to aid the discussions on the minimal dataset for genomics data in GDI.
Catalogue
Connecting to healthdata@eu has been investigated. They have two portals, a current one and one in development, that is available for member states to connect to and provide feedback on. For that one the team has investigated their harvesting process and sent some questions on their current set-up.
CKAN turned out to have created support for DCAT APv3 after all, while during the work of the team itself they were not responding on our questions about it, so the team started on it themselves as well. A follow-up will be in next sprint to see how it works and if it's usable. Dataset series is not yet supported, but being investigated by the CKAN community. Jeroen and Hannah will ask Geonovum (leader of DCAT NL) if they will add support for the Dutch implementation.
The front-end Betawerk developed was tested by the team and issues found have been added to their backlog. Sorting was one of the issues looked into.
Dynamic filtering options will be further investigated to see how in future we can deal with changing and differences in metadata models.
Next priorities:
Finishing up the new front-end for the catalogue so that it can go live on Sept 30. This will still be based on the plateau 1 metadata model.
Metadata model v2 will be released by the data team around September - October. Afther that the implementation team can start with integrating it into the catalogue. Betawerk will adapt to changes in the front-end, from November. Mind that once the metadatastandard v2 becomes available, work will still be needed by dataholders to make sure that the datasets are updated to more complete metadata. Version 1 datasets can still be present in the catalogue. How the front-end will deal with it is subject of further exploration. Also, how the dynamic filters will behave might look unexpected after adding an extra metadata model. Users should be able to filter on the schemes that metadata complies to and will need to be informed about the behavior of the filters.
Request tool MVP implementation will be happening by GAC between August and November. Tasneem will be working on design of request form and request portal with GAC. In every sprint review an update can be given. Tasneem will invite them for a demo if possible. Training will train HRI so that the case management can be configured. Any training will be recorded. In this plateau will not yet be connected to internal processes in our nodes. Investigating the role of data holders in the process is also a topic to investigate.
HDAB-NL
On 3 September there was a Kick-off meeting of the HDAB-NL program. The program was introduced and the status of the EHDS explained. The workpackages in the HDAB-NL program organized break-outs. And stakeholders and future (end) users were asked to think along using a community platform: HDAB-NL community
August 20, 2024
Onboarding process
Github documentation improved.
SOPs made for troubleshooting of onboarding.
Strategy for persistant idenfiers and metadata upadates is currently a priority.
Requirement for expansion of the metadata model has been collected.
Developements in strategy for Petal metadata developements (domain specific). V1 of the process expected end of Q4 2024. Domain-specific metadata schema development.
Catalogue
The implementation team spent the sprint in investigating the way forward until end of 2024, plus general housekeeping
Missing fields originating from the PRISMA metadata omboarding were added
Synthetic GDI dataset is findable in the catalogue
Betawerk cooperation for FE of catalogue (May – August)
Implementation phase is almost complete
Delivery will be next sprint
Technical testing in September
National Health Data Request tool MVP implementation by GAC (August – November)
The OTAP environments have been created
Implementation of the Case management system where the request will be handled started
Functional design of the request form and request portal will begin in the next weeks
Training will take place in Sept/Oct
August 6, 2024
Onboarding process
Version V1 of the metada model and the shapes has been released on GitHub.
Modelling group is finalizing requeirements for expansion of the HRI metadata model.
The Data team is recuriting new onboarding projects. If you are interested in onboardign please contact servicedesk@health-ri.nl
Catalogue
The implementation team together with the onboarding team is helping to get the metadata for two datasets from UMCG into the catalogue (one of which is Parelsnoer). Some minor validation issues were found in the metadata which is being looked into further. The harvester was triggered and the datasets made visible in the test version of the catalogue.
A metadata model release procedure document was written to smooth out the process for releasing new metadata models by Health-RI. The validator website now automatically gets the latest version as well, so no discrepancies will occur anymore between this website and github.
The Health-RI FDP was downgraded to the same version our nodes are using, because of issues in the newest version.
Catalogue entities in the FDP were harvested as catalogues into the national catalogue. This was investigated. We’re working on a solution for this, which for now will be a setting to decide to not harvest catalogue entities, or harvest them as a dataset. By default they will not be harvested, because the catalogue doesn’t have a way yet to deal with catalogues differently than as datasets.
A video was created to help in harvesting metadata
HDAB
Within the HDAB-NL project we’re working towards the kick-off on 3 September.
With other European countries we’re working together in the CoP (Community of Practice) to interpret the information that is available about the EHDS, share knowledge in various sub groupes, and to investigate what member states will need to do in their respective HDAB programmes.
July 24, 2024
Onboarding process
TopDesk training was done for the Data Team and Implementation team. Incoming issues for onboarding will be now processed via the managment system to ensure better service for data providers and data holders.
The landing page for Onboarding has been updated to better describe the steps of the onboarding process.
Expanded description and examples of variables for the core metadata model have been added to the model description on GitHub:
Modelling group started implementation of mandatory items DCAT-AP NL and HealthDCAT-AP.
First draft of Process for development of domain-specific metadata has undergone an internal review. The nodes will be invited to provide feedback after a kick-off on 24th of July.
Prisma dataset is now available on the National Health Data Catalogue.
Catalogue
User interface of catalogue
The following designs are completed and Betawerk is implementing it currently
Data catalogue landing page
Data catalogue dataset page
Data catalogue dataset details page
Data catalogue about and FAQs page
Data catalogue basket page
The following features are out of scope currently for the release in September (before Health-RI conference)
Login
Content Management System (CMS) based on Drupal
Multi-lingual pages
Dynamic filtering
Request application
Tasneem had an initial design discussion with GAC consultant in July. Based on that, the following is the solution direction:
Form directly after requesting a data set
A web form built by a developer which can be done by a Health-RI developer or a Functional Design needs to be created and a developer at GAC can build this.
Must meet:
Dataset name can be added in the form
From the form a case is created for each requested dataset ( can be multiple)
Fields from the form are integrated with fields on case, contact, account
CRM
This is where contact/account and Case information will be stored.Validation needs to be done at Health-RI (is the requester obliged to request the dataset)
Specified fields are complemented
Business process Flow, account, contact, case forms, fields are personalized for Health-RI
Portal
Requesters can access their case information. See what the status is and communicate via portal notes with Health-RI or Dataprovider
Must meet: Dataprovider is also able to access the cases when they’re assigned to it
The pilot will start in August and we expect to start with getting a training in mid-August
July 11, 2024
Onboarding process
Content strategy for onboarding has been discussed with Tactical coordinators, the working groups, and Heads of nodes and FAIR coordinators. More detail on the strategy per node will be developed in the coming period.
Traing has been done for the Data Team and Implementation team for the managment system TopDesk to provide more efficient support to data holders looking to improve. Testing of the system is planned for the coming weeks.
FDP pages on the confluence guide has been updated to reflect current status.
Metadata model specification have been move to GitHub togehter with the rest of technical specifications.
Ongoing intake of new onboarding projects is underway.
Lessons learned of pilot onboarding projects are being collected (Prisma and Biobanks UMCG/AUMC)
Catalogue
Most relevant for our piloting data holders:
The Prisma dataset has been included on the test environment of the catallogue. A few remaining issues are to be solved that were identified by this test (see below). Part of the changes needed will be picked up by Betawerk in the new front-end they are designing and building.
We identified an issue with datatype being mandatory in the w3c specifications, but not on the FAIR Data Point, this lead to a discrepency between our model and our Shacls. It will be solved by relaxing this obligation for now, as it is also not present at all pilot data holders at the moment. We will be investigating in the coming months how to deal with changing metadatamodels, as we do not intend to drop datasets from the catalogues while we develop our metadata models if we can prevent it.
Also an issue was there with Vcard because of the discrepancy between FDP and W3c specs, will be fixed without expected impact on the piloting data holders.
Other updates on catalogue and FDP:
Housekeeping has been done on the FDP extensions: Applying best practices like having a readme, a ‘how to contribute’, unit test reports, etc.
The login possibility is now also disabled at the 'add to the basket page' of the catalogue (as long as we don't have a finished login procedure). We'll go through a step by step development process with the SRAM login option, before we enable login again. On the short term: manually adding users to a cooperation so that logging in becomes possible. After that, the process will become more automated. Eventually all medical institutions should be able to login to the catalogue for the functions that need a login.
A standard Azure plan has been implemented so that there is backup support available for the catalogue. In practice it means that for the CKAN backend we can set up a call with Microsoft if we need support.
There was a missing Agent shape in the Dataseries entity, this has been fixed.
Publisher is now mapped correctly and dashes are removed from the titles.
In the EUCAIM program, we successfully carried out a proof of concept, having Molgenis harvest the FDP. In the demo it was shown how metadata that was harvested before, was edited, and then the edited items became visible in the catalogue. This was a pure proof of concept and not production-suitable, there will be follow-up development within the EUCAIM project.
User interface of catalogue
With Betawerk we discussed the designs for the catalogue user interface, they will be working further on creating it for us.
Request application
We expect to start with getting a training in August and start doing a pilot in the months after that.
HDAB
Healthdata@eu demonstrated their first release of catalogue and request form. We have been invited to test connecting to the central catalogue and provide feedback to healthdata@eu. In our next sprint we will investigate this.
Furthermore, preparations for the September kick-off meeting are in progress and we are preparing to set up the document structure that we will need, in order to be able to deliver first draft requirements and specifications at the end of November 2024.
June 27, 2024
Onboarding process
The first meeting meeting of weekly modelling session for the Health expansion of metadata set has been held.
Kick-off meeting on 20th of Jun.
Comparison of DCAT-AP NL and healthDCAT-AP drafted after the release of DCAT-AP v3 by the European Commission, you can find the draft here.
Work done on the issues relating to the SHACL shapes for the core metadata schema as preparation for release V1 in early July.
Walk-in hour now with sign-up sheet to ensure relevant expertise present.
Catalogue
Further work has been done to help the first onboarding onto the catalogue;
Shapes for the metadata have been made available on the test environment of the FDP (which is linked to the test environment of the catalogue)
Some more focus on test automation is still needed in the coming period
Further formalization of documentation and testing the shapes before releasing them is a focus point, so that people will not look into work in progress on GitHub, but can look into official releases from Health-RI.
New designs have been created for an improvement of the user interface of the catalogue. The new design make the catalogue more intuitive and modern. We decided on an incremental approach: First improve the catalogue user interface based on common best practices and incrementally improve once it’s there and we can do user tests and learn from experiences.
HDAB
In the HDAB-NL program we’re joining Community of Practice meetings with the member states that are also working toward HDAB requirements. Through these meetings we are informed that
Towards the end of 2024 the EHDS is expected to become definitive.
Furthermore, in September HealthDCAT will be published by the pilot with cardinalities, from there on TEHDAS2 project will start working with it
A number of releases from the development of the European portal will take place, and based on the technologies proven, implementing acts will make certain choices mandatory for the European member states to implement in their HDABs. So we will be following these developments and adjusting our choices accordingly.
In the kick-off on 3 September for the HDAB-NL project we will share more information on EHDS and HDAB-NL and ask our stakeholders to co-create with us.
June 11, 2024
Onboarding process
Work is being done on expanding metadata definitions of the core metadata schema to ad clarity for mapping. Expected to be in production by end of month.
Ongoing Github cleanup to allow better traceability of code versions, especially of the shacles.
Ongoing intake of new onboarding projects for Plateau 2.
Fix done on Radboud Prisma FDP, workaround added to Known issues.
An overview of existing issues coming in via GitHub, email, and other sources has been made. Most of them were solved, or a workaround was created. Long-term plan to swith to a TopDesk managment system for tracking of issues.
Two ongoing onboarding projects (PRISMA and UMCG biobank) currently both have functioning FDP. We are awaiting data entry and havesting. As a follow-up a procedure will be written for future onboarding reference.
Catalogue
A new beta version is available on Health-RI - Nationale gezondheidsdatacatalogus . Meanwhile, a newer, more user friendly user interface is being developed for it.
We are in contact with a small group of data owners to overcome the problems of exchanging metadata with the catalogue, via the Fair Data Point (FDP).
Currently it contains our datasets that we also had on the COVID19 portal, imported from the metadata entered in Cedar. We’re working hard to onboard the first datasets via Fair Data Points at our nodes, see for more information: Data Onboarding.
Some more details about the recent implementation work:
Previously, harvesting had to be triggered manually. Now a cronjob has been put in place to make sure that the harvester will regularly check for updates during the day. This makes sure that updates from connected Fair Data Points will be harvested into the catalogue. Including a new Fair Data Point to be harvested remains a manual process for now. We may also want to automate that in the future.
Keycloak was connected with SRAM. Health-RI now has its own organization within the SRAM domain. Other organisations will now be able to allow their users to login to Health-RI services, as a preparation for when we add logged-in functions to the catalogue (like a previously selected dataset list), and for when we will add a request application.
Main priorities for catalogue implementation with a time indication (not a promise).
Join the data team in helping data holders get their metadata into the catalogue (in progress)
Cooperating with Betawerk to create a more user friendly design of the catalogue (May-August)
Looking into everything necessary to connect to a request application (June-August)
Commitments within European projects like the EUCAIM and GDI projects, where the same infrastructure is being set up but for specific domains and we contribute to setting up the same processes for imaging and genomics.
HDAB
In the HDAB-NL program, we’re working with VWS, ICTU, RIVM and CBS on the infrastructure for a Dutch HDAB (Health Data Access Body), which is required by EHDS legislation. The goal is that researchers will be able to search in a European catalogue for health data that is spread among European member states, in the same way that this has been realized for geospatial data. For this, each member state is to deliver a national node, to connect to the central European one. National HDABs need to be able to handle requests for data that is within the national node.
This year, we are working on requirements and specification, based on EHDS requirements and HealthDCAT specifications, both of which are not final yet. They will not be final until after the summer of 2024.
We are in an exploratory phase right now. Stakeholder engagement will kick-off in September. All stakeholders that are affected by EHDS will be welcome to think along with us. So keep an eye out for the invitations that will be coming soon.