Highlights from the first 18 months of the European Genomic Data Infrastructure Project

The European 1+Million Genomes (1+MG) Initiative aims to enable personalised medicine and health by providing secure, cross-border access to high quality genomic data and integration with related health data. Since its inception in 2018, the initiative has gained support from 25 EU countries, and the UK and Norway.

eighteen-months-of-gdi

The European Genomic Data Infrastructure (GDI) project began in 2022 with the goal of implementing the vision of the 1+MG initiative. The project is a collaboration between 24 European countries, four of which formally joined in April 2024.

GDI builds on the preparatory work of 1+MG working groups, the Beyond 1 Million Genomes (B1MG) project and investments of EU countries. The project is creating and deploying the technical capacity for accessing genomic and related data by establishing a federated, sustainable and secure technical infrastructure.

As of the end of April 2024, the GDI project is 18 months old. Here we identify project highlights and look forward to the remaining two and a half years.

Technology deployment

In June 2023, the initial version of the infrastructure was released as the GDI Starter Kit. This is a set of software components and infrastructure based on open standards providing GDI nodes with the technical capability to access synthetic genomic and phenotypic data across borders.

Also in June 2023, the seven early adopter nodes (Belgium, Finland, Luxembourg, Netherlands, Norway, Spain and Sweden) demonstrated infrastructure operation with synthetic data.

Efforts have since focused on enabling operational data access across national borders in the same early adopter nodes, based on a defined workflow:

  • User discovers phenotypes of interest aggregate data in 1+MG data
  • User authenticates and discovers both a genomic variant and treatment regime and/or phenotype of interest
  • User applies for data access to 1+MG data
  • Data Access Committee grants specific access to a data collection
  • User executes analysis as a controlled access user on this virtual cohort across federated locations

Examples of the types of queries supported in this initial functionality are:

Rare diseases: Do you have any individuals with a mutation in the RYR1 gene and a similar phenotype to congenital myasthenic syndrome?

Cancer: Do you have any individuals with a mutation in the PTEN gene, who have BRAF biomarkers and are being treated with vemurafenib?

Infectious diseases: Do you have any individuals with a mutation in the TLR7 gene who have been diagnosed with COVID-19

Access to real-like synthetic data across national borders (Portugal, Spain, Sweden, Finland, Norway and Luxembourg) was demonstrated at the end of April 2024, and a public video of the process will be made available.

Sustainability and governance

Following evaluation of possible legal frameworks, creation of a European Digital Infrastructure Consortium (EDIC) has been proposed for implementation of the 1+MG infrastructure.

The EDIC provides the legal basis for data processing and allows the exploration of additional capabilities needed for secondary use, which would otherwise require a change of national legislation in some countries. It also has the advantage of supporting secondary use in healthcare and policy development, in addition to research.

Having selected the EDIC as the most promising solution, initial steps have been taken to assess the financial requirements. Costs have been defined for each functional element in the infrastructure, including estimating how they might scale with infrastructure size.

Costing predictions have taken into account the need for secure processing environments (due to the sensitive nature of the data) and the substantial requirements for data storage. The infrastructure will also need the financial means to provide support to a large number of different users working in research, healthcare and innovation.

Looking forward

During the next two and a half years, the GDI project will work at the technical, operational and governance levels to develop and expand the 1+MG infrastructure. This will include defining and delivering a minimum viable product which demonstrates the infrastructure capabilities to a range of stakeholders.

Deployment of open community-based solutions will continue, with the solutions validated on population genomics (Genome of Europe), cancer, infectious diseases and rare diseases, and tested on synthetic data (and real data if possible). The infrastructure will be expanded to more use cases, with additional functionalities available, for example federated analytics and learning.

GDI nodes will continue to be supported at the onboarding, deployment and operations stages. The GDI project will also provide communication resources for engaging with key stakeholders, including citizens and healthcare providers at the national level.

The GDI project will work to support the establishment of the 1+MG EDIC, and will add to our cost framework analysis with an exploration of possible sources of financing for the maintenance of identified functions.

Finally, we will continue to work closely with other European projects and initiatives, for example, the Genome of Europe (GoE), the European Health Data Space (EHDS) and the European Federation for Cancer Images (EUCAIM), and we will make sure we remain aligned with the rapidly changing regulatory landscape.

In closing, we share a summary of what the 1+MG infrastructure will offer to the evolving European health and research ecosystem.

The 1+MG infrastructure will provide:

  • An EU-wide data collection with critical mass to support evidence-based healthcare;
  • Harmonised data, prepared for secondary use;
  • A single data access process that will be cost-effective, efficient and rapid;
  • High transparency and consent mechanisms that include opt-in/opt-out and easy mechanisms to object.

Find out more

Posted: 3 May 2024