Abstract
Real-world evidence from multinational disease registries is becoming increasingly important not only for confirming the results of randomised controlled trials, but also for identifying phenotypes, monitoring disease progression, predicting response to new drugs and early detection of rare side-effects. With new open-access technologies, it has become feasible to harmonise patient data from different disease registries and use it for data analysis without compromising privacy rules. Here, we provide a blueprint for how a clinical research collaboration can successfully use real-world data from existing disease registries to perform federated analyses. We describe how the European severe asthma clinical research collaboration SHARP (Severe Heterogeneous Asthma Research collaboration, Patient-centred) fulfilled the harmonisation process from nonstandardised clinical registry data to the Observational Medical Outcomes Partnership Common Data Model and built a strong network of collaborators from multiple disciplines and countries. The blueprint covers organisational, financial, conceptual, technical, analytical and research aspects, and discusses both the challenges and the lessons learned. All in all, setting up a federated data network is a complex process that requires thorough preparation, but above all, it is a worthwhile investment for all clinical research collaborations, especially in view of the emerging applications of artificial intelligence and federated learning.
Abstract
Harmonising real-world patient data from diverse registries to allow federated analyses is a complex process that requires thorough preparation but is above all a valuable investment, especially in view of emerging applications of artificial intelligence https://bit.ly/3NEKKnV
Introduction
Targeted biologic therapies have significantly improved the lives of many patients with chronic inflammatory diseases such as rheumatoid arthritis, ulcerative colitis and asthma [1–3]. Unfortunately, biologic therapies are expensive and it is often unclear which patients benefit most from a particular biological agent [4–6]. National disease registries have therefore been set up in many countries at the initiative of governments, insurers or medical associations to monitor the effectiveness, costs and side-effects of biologics [7].
In the case of severe asthma, individual national registries have yielded interesting publications, although many important research questions including rare adverse effects or comparative effectiveness of different biologics could not be answered due to a lack of sufficient statistical power and reproducibility [8–12]. In addition, real-world evidence from multinational disease registries became increasingly important not only for confirming the results of randomised controlled trials, but also for identifying phenotypes, monitoring disease progression and targeting the right biologic to the right patient [13].
Meanwhile, the European Respiratory Society (ERS) had encouraged and financially supported the establishment of a clinical research collaboration (CRC) called SHARP (Severe Heterogeneous Asthma Research collaboration, Patient-centred) [14]. The ambition of SHARP was to connect all existing severe asthma registries in Europe. To that end, patient data from different registries had to be harmonised to allow data analyses in such a way that would not compromise the privacy of patients. Because some registries were reluctant to transfer patient data outside the institution where it was collected, SHARP opted for a federated analysis approach, which uses patient-level data from different sources without actually pooling the data together in a central database.
Several harmonisation and federation approaches, platforms and structures were considered [15–20]. SHARP decided to use the open-source Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), developed by Observational Health Data Sciences and Informatics (OHDSI), which is currently one of the top-rated models for sharing medical data [21]. This model best meets criteria such as content coverage, integrity, flexibility, ease of retrieval, compatibility of standards, and ease/scope of implementations, privacy and connectivity [22, 23]. Importantly, the OHDSI/OMOP CDM is the standard used by European Health Data Evidence Network (EHDEN), which is a key initiative that sets the pace for federated analytics in Europe and the USA [24]. Thus, the OHDSI/OMOP CDM offered great potential for connection to this fast-growing network.
Here, we describe the harmonisation process that SHARP has gone through and provide a blueprint for how to successfully use real-world data from existing disease registries to perform federated analysis. The blueprint covers organisational, financial, conceptual, technical, analytical and research aspects, and discusses both the challenges and the lessons learned. The blueprint can be used as a guide for other clinical research networks with a similar ambition to link registries containing patient data.
Harmonisation of severe asthma registries
SHARP's initiative to link data from disease registries from different countries was not only ambitious, but also innovative and unique, as no previous examples of this had been published before. Initially, the whole project seemed unfeasible due to the incompatibility of the local data models. Each country had its own electronic case report forms and database structure, in its own language. In addition, legal and regulatory requirements and strict data protection and privacy regulations (e.g. the European Union General Data Protection Regulation (GDPR)) restricted the transfer of patient-level data outside a healthcare provider [25]. Transfer of data outside the country of origin was excluded.
With the OHDSI/OMOP CDM it seemed feasible to meet these challenges [21]. Following the initiative of EHDEN, research studies would be conducted in a federated manner so that personal data would remain on the local sites, thus retaining full control over what happened to their data and what studies they would participate in [24]. In particular, the harmonisation process would remove patient identifiers and, furthermore, only aggregated summary statistics would be exported for meta-analysis. Since aggregated data are privacy-proof by nature, federated analyses comply with the GDPR and ethical research guidelines.
Without previous examples on how to harmonise nonstandardised disease registries and build a federated analysis platform (FAP), SHARP was not quite sure what to expect. On paper, the procedure seemed simple (figure 1): match the field names from the local database with concepts in the CDM; create an Extract, Transform, Load (ETL) procedure to automate the mapping of the local database to a unified format; make the translated data available for local analysis; perform an identical analysis on each registry; combine the aggregated results. However, the reality was that we had to overcome challenges at the organisational, financial, conceptual, technical, analytical and research levels.
Architecture of the federated analysis platform. Field names of the different national registries are mapped to concepts in the Observational Health Data Sciences and Informatics (OHDSI)/Observational Medical Outcomes Partnership (OMOP) Common Data Model. An Extract, Transform, Load (ETL) procedure is created to automate the mapping from the local database into a unified format; the harmonised data are made available for local analysis using the OHDSI toolset or R code; an identical analysis is run on each registry; the results are combined using federated analysis tools. SHARP: Severe Heterogeneous Asthma Research collaboration, Patient-centred.
Key learnings
In the course of the harmonisation process, SHARP learned a number of important lessons, which it would like to share here with other CRCs that also have the ambition to implement such harmonisation. These lessons are listed in the following subsections by category.
Basic operational prerequisites
In order for a harmonisation process between existing disease registries to be successful, a number of general preconditions must be met. These concern professional project management, availability of sufficient financial resources and signed collaboration agreements between all parties. In addition, it must be ensured that the local ethics committees, the institutions and the patients have given written informed consent for the use of their medical data for scientific research.
As the first to gain experience with this complex harmonisation process, SHARP was not well prepared for these preconditions. Until then, it had only collected summary data from the various European registries with little financial support [26]. The administrative burden quickly became a challenge for the limited support of the ERS and a dedicated, full-time project manager had to be appointed. In addition, legal services in order to establish service and research agreements, a professional statistician, and the EHDEN-trained small and medium-sized enterprises (SMEs) responsible for the mapping of variables in the local databases to the OHDSI/OMOP CDM and for the building of a FAP were all necessary and all had to be paid. All in all, a budget of around EUR 200 000 per annum was required to cover these expenses.
Understanding the OHDSI/OMOP CDM
An absolute requirement for successfully building a FAP is that every stakeholder understands the harmonisation concept well and has no doubts or hesitation in participating in its implementation.
For SHARP, the use of the OHDSI/OMOP CDM for the harmonisation of patient-level data was new and conceptually different from the traditional use of such data for scientific research [27]. Time and again, SHARP encountered lack of confidence in the OHDSI/OMOP concept. This was mainly due to insufficient familiarity with the concept, and lack of knowledge and understanding. Clinicians were concerned that patients’ privacy was not sufficiently guaranteed. Local legal officers were unsure whether the data handling was secure enough, registry owners were unsure about data ownership, researchers were concerned that their data could be misused by competitors and information technology (IT) administrators were reluctant to give third parties access to their servers, due to regulatory concerns or internal IT procedures. Only intensive and repeated education and communication allowed the various parties and partners to ultimately be convinced and enthusiastically take part in the project.
Mapping registry data to the OHDSI/OMOP CDM
A key part of the harmonisation process is the mapping of source data to the OHDSI/OMOP CDM. Due to the diversity of format and language of the SHARP registries, this had to be manually conducted for each registry, one at a time. The process required fluent and efficient collaboration between the project manager, clinical expert, source data expert, medical terminologist/mapping expert, developer/tester and statistician.
Not surprisingly, the mapping process faced several challenges, including incomplete registering at source, e.g. the lack of start and stop dates of medications, and dates when various procedures had taken place. Ideally, the mapping process should be performed on the basis of a registry “data dictionary”, i.e. a file containing variable names, data types, units of measure, etc., because this enables the use of existing mapping tools. In SHARP, the registries could not provide such a data dictionary. The mapping process therefore required a more “iterative” approach than expected, as there were many “mismatches” between the data types and the actual content of the source. All these issues could only be resolved by joining forces. Unfortunately for SHARP, in-person communication was severely hampered by the coronavirus disease 2019 (COVID-19) pandemic and the lockdown measures.
IT requirements and data access
The mapping of source data to the OHDSI/OMOP CDM is automated in an ETL procedure, reading the source data and writing the harmonised data into an OHDSI/OMOP CDM-compatible database. Smooth operation requires a server located in the registry's data centre (or in a cloud environment, if local IT regulations allow) for taking snapshots of de-identified source data. The server can also host the analytical tools (R environment, OHDSI tools); alternatively, these tools can be hosted in a dedicated environment. Of course, the local servers should be accessible by the SME, but for SHARP this proved to be difficult in some cases due to local IT regulations. Nevertheless, it is highly recommended to establish access for the SME, since otherwise local IT teams have to be trained to fulfil the job.
Data quality assessments
In order to obtain the best quality of harmonised data and minimal loss of original data, it is important that source data comply with the rules of the data dictionary, which was not always the case. For a successful mapping between registry data and the OHDSI/OMOP CDM, it is therefore important to test and validate the data quality. To this end, SHARP deployed a professional statistician who could form a bridge between the clinicians and the mapping and source data expert. This statistician wrote R scripts for descriptive statistical analysis that could be performed automatically by all local registers. Due to the diversity of the registry structures and the different levels of completeness of each variable considered, the R script at each stage had to include checks on the numerical range and to account for high levels of (or complete) missing data. The local registries were then presented with their own data overviews in well-arranged tables and graphic displays. Ideally such checks should eventually be performed on all variables of each registry before finalising the mapping.
At SHARP, quality checks revealed unexpected missing data codes, impossible values and some mismatches due to the use of free-text fields by the clinicians who had entered data. Where necessary, changes were made to the mapping schema and in some cases to the source data in the local registry database. Again, these solutions required time and close collaboration between clinicians, source data experts, mapping experts and data analysts.
Data analytical aspects
Using a FAP and analysing real-world data from different disease registries in different countries requires strong analytical skills. In fact, the person in question must unite epidemiological, biostatistical and observational data science expertise, be a confident programmer, and be willing to learn the ins and outs of the OHDSI/OMOP CDM. Also, the statistician should be able to perform an appropriate meta-analysis of summary statistics to draw conclusions from all participating registers. Of course, and luckily, more than one person may fulfil the different aspects of this role in the studies.
While processing data from the SHARP registries, it became clear that a statistician be engaged at the outset of the project and be involved in the writing of all protocols and analysis plans. This helps to ensure that the necessary data are available and mapped across all relevant registries, and that any local categorisation of data does not preclude the planned analysis.
Recommendations and blueprint
Table 1 shows the blueprint with recommendations for an optimal harmonisation process between disease registry data and the OHDSI/OMOP CDM for multinational federated analyses.
Blueprint for harmonising disease registries using the Observational Heath Data Sciences and Informatics (OHDSI)/Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)
A schematic summary of required steps for harmonising disease registries using the OHDSI/OMOP CDM is given in figure 2. An estimate of the time required per item is given in supplementary table E1 and supplementary figure E1. Registries that are currently connected or in the process of being connected are listed in supplementary table E2.
Schematic summary of steps to be taken for a successful harmonisation process of local nonstandardised disease registries to the Observational Health Data Sciences and Informatics (OHDSI)/Observational Medical Outcomes Partnership (OMOP) Common Data Model for federated analyses. SME: small and medium-sized enterprise; IT: information technology; FAP: federated analysis platform.
Discussion
Here, we described our experience in harmonising patient data from different European severe asthma registries using the OHDSI/OMOP CDM. Based on the lessons learned, we put together a blueprint that can be used by researchers in other disease areas where there is a desire to establish federated data networks of real-world patient data already collected in nonstandardised registries. The harmonisation process was not without challenges, but it was above all a unique experience to connect colleagues and partners from different countries, specialties and disciplines in one large federated project.
To date, most studies on the OHDSI/OMOP CDM were related to architectural concepts and tool development [27]. However, over the last couple of years, an increasing number of publications have appeared using the OHDSI/OMOP CDM in prospective network studies with observational patient data, in particular related to the COVID-19 pandemic [28–32]. Other studies have used large administrative claims databases [33, 34] or electronic medical records databases [35, 36]. A recent study described the technicalities of harmonising standardised disease registries [37]. Our study is the first that used the OHDSI/OMOP CDM to harmonise nonstandardised national disease registries.
When the SHARP CRC was founded in 2017, its vision was to incrementally change the research culture across Europe, emphasising ambitions that serve the collective needs of the asthma research community and bringing people with asthma to the centre of the research environment into a reality context [14]. SHARP's goals included better understanding the mechanisms of severe asthma, improving treatment for severe asthma and exploring ways to prevent severe asthma. It wanted to achieve this by establishing a platform that would allow the integration of local national asthma registries into a pan-European multicentre registry of patients with severe asthma [21]. At the same time, the scientific community expressed the increasing need for more large-scale real-world research; not only for confirming the results of randomised controlled trials, but also for identifying phenotypes, monitoring disease progression, predicting response to new drugs and detecting rare side-effects [38, 39]. However, due to concerns regarding data privacy, data security, data access rights and data ownership, some SHARP registries were reluctant to transfer patient-level data to one central database, as was the case with other international registries such as the International Severe Asthma Registry [40]. However, in order not to lose the precious data from these existing registries, it was then decided to establish a federated data platform and use the OHDSI/OMOP CDM to harmonise the databases [21].
At that time, the use of the OHDSI/OMOP CDM was relatively new and had never been applied to existing disease registries. Since there was no example of how to approach the harmonisation process, it was not surprising that SHARP encountered multiple challenges and obstacles, from which it ultimately learned a lot.
In retrospect, the unfamiliarity and misunderstanding of the OHDSI/OMOP CDM concept among doctors, researchers, legal entities and IT administrators was perhaps the main reason why the process was sometimes unnecessarily delayed. There were concerns that data privacy would not be guaranteed, data would fall into the wrong hands and the security of data centres would be compromised. Therefore, we cannot emphasise enough the need to repeatedly explain the concept and process of harmonisation to all stakeholders, through meetings, presentations and personal discussions.
Furthermore, it appeared that collaboration between clinicians, IT technicians, registration holders and legal entities was essential, and that they all should be able to devote sufficient time and attention to the project. Not only for the initial harmonisation process, but also prior to any future research project, such multidisciplinary dedicated teams should be set up for each registry. Team members should be able to consult each other easily and ad hoc, preferably by mobile phone.
Investing in building the FAP and achieving the harmonisation of severe asthma registries has brought many benefits to the SHARP CRC. First, thanks to the joint effort and overcoming adversity, it has created a strong and solid partnership between many stakeholders, including patients, clinicians, researchers, pharmaceutical industries, IT technicians, data analysts and consultants. Second, it now features a state-of-the-art platform that allows for innovative and large-scale real-word studies with relatively little effort. Finally, and perhaps most importantly, because of its privacy-protected structure, scalability and generalisation, the SHARP FAP is now perfectly equipped for the future in which artificial intelligence and federated learning will play an increasingly important role in generating evidence with real-world data [41–43].
Conclusions
We have provided a blueprint for what it takes as a nonprofit CRC to successfully use real-world data from existing disease registries for executing federated analyses. The open-access OHDSI/OMOP CDM has enabled patient data from different disease registries to be harmonised and used for data analysis without compromising privacy rules. We have learned that building a FAP to enable large-scale analysis of patient-level data from nonstandardised registries is a complex process, and can only be successful if all parties fully understand and support the concept. At the same time, it ensures strong collaboration and builds an enriching network that enhances the knowledge and interrelationships of all partners with the common goal of using real-word data efficiently. We believe that, especially given the increasing adoption of artificial intelligence and federated learning, the harmonisation of disease registry data to a CDM is a worthwhile investment, which we can certainly recommend to other CRCs. Ultimately, the rewards of such efforts will manifest in terms of improved disease understanding and better patient care.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00168-2022.SUPPLEMENT
Acknowledgement
We wish to recognise the contribution of the National Leads of the SHARP countries for their help to build the network and for the communication with all the actors involved in the process of building the SHARP federated analysis platform.
Footnotes
Provenance: Submitted article, peer reviewed.
Conflict of interest: J.A. Kroes reports grants from AstraZeneca BV outside the submitted work. A.T. Bansal has nothing to disclose. E. Berret is an employee of the European Respiratory Society. N. Christian is an employee of ITTM SA. A. Kremer is an employee of ITTM SA. A. Alloni is an employee of ITTM SA. M. Gabetta is an employee of Biomeris SRL. C. Marshall has nothing to disclose. S. Wagers reports personal fees from King's College Hospital NHS Foundation Trust, Academic Medical Research, AMC Medical Research BV, Asthma UK, Athens Medical School, Boehringer Ingelheim International GmbH, CHU de Toulouse, CIRO, DS Biologicals Ltd, École Polytechnique Fédérale de Lausanne, European Respiratory Society, FISEVI, Fluidic Analytics Ltd, Fraunhofer IGB, Fraunhofer ITEM, GlaxoSmithKline R&D Ltd, Holland & Knight, Karolinska Institutet Fakturor, KU Leuven, Longfonds, National Heart and Lung Institute, Novartis Pharma AG, Owlstone Medical Ltd, PExA AB, UCB Biopharma SPRL, Umeå University, University Hospital Southampton NHS Foundation Trust, Università Campus Bio-Medico di Roma, Universita Cattolica del Sacro Cuore, Universität Ulm, University of Bern, University of Edinburgh, University of Hull, University of Leicester, University of Loughborough, University of Manchester, University of Nottingham, Vlaams Brabant, Dienst Europa, Imperial College London, Boehringer Ingelheim, Breathomix, Gossamer Bio, AstraZeneca, CIBER, OncoRadiomics, University of Leiden, University of Wurzburg, Chiesi Pharmaceutical, University of Liege, Teva Pharmaceuticals, Sanofi, Pulmonary Fibrosis Foundation and Three Lakes Foundation, outside the submitted work. R. Djukanovic has received a grant from Novartis for a CI-led project that the funder agreed to support without any restrictions or influence on its contents, analysis or publication; has received consultancy fees from Teva Pharmaceuticals, Sanofi, Boehringer, Novartis and Synairgen; has received grants paid to his institution from the IMI-funded EU project U-BIOPRED, the MERC-funded RASP-UK project, the EME/MRC-funded BEAT Severe Asthma project and NIHR BRC; payment for lectures on the mechanisms of action of Xolair from Novartis and mechanisms of asthma from Teva; and has stock in a University of Southampton company, Synairgen. C. Porsbjerg has received grants and consulting fees paid to her institution, and personal honoraria from AstraZeneca, GlaxoSmithKline, Novartis, Teva, Sanofi, Chiesi and ALK. D. Hamerlijnck has nothing to disclose. O. Fulton has nothing to disclose. A. ten Brinke has received grants paid to her institution from AstraZeneca, GlaxoSmithKline and Teva; and fees paid to her institution for advisory boards and lectures from AstraZeneca, GlaxoSmithKline, Novartis, Teva and Sanofi/Genzyme, all outside the submitted work. E.H. Bel has received grants paid to her institution from GlaxoSmithKline and Teva; and consulting fees from AstraZeneca UK Ltd, GlaxoSmithKline Services UnLtd, Sterna Biologicals, Chiesi Pharmaceuticals, Sanofi/Regeneron and Teva Pharmaceuticals. J.K. Sont has received a grant from GlaxoSmithKline, outside the submitted work.
Support statement: The European Respiratory Society SHARP CRC is supported by GlaxoSmithKline, Teva Pharmaceutical Industries, Novartis, Sanofi and Chiesi Farmaceutici. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received April 4, 2022.
- Accepted June 29, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org