Abstract
Background Patients with chronic obstructive pulmonary disease (COPD) often suffer episodes of exacerbation of symptoms (ECOPD) that may eventually require hospitalisation due to several, often overlapping, causes. We aimed to analyse the characteristics of patients hospitalised because of ECOPD in a real-life setting using a “big data” approach.
Methods The study population included all patients over 40 years old with a diagnosis of COPD (n=69 359; prevalence 3.72%) registered from 1 January 2011 to 1 March 2020 in the database of the public healthcare service (SESCAM) of Castilla-La Mancha (Spain) (n=1 863 759 subjects). We used natural language processing (Savana Manager version 3.0) to identify those who were hospitalised during this period for any cause, including ECOPD.
Results During the study 26 453 COPD patients (38.1%) were hospitalised (at least once). Main diagnoses at discharge were respiratory infection (51%), heart failure (38%) or pneumonia (19%). ECOPD was the main diagnosis at discharge (or hospital death) in 8331 patients (12.0% of the entire COPD population and 31.5% of those hospitalised). In-hospital ECOPD-related mortality rate was 3.11%. These patients were hospitalised 2.36 times per patient, with a mean hospital stay of 6.1 days. Heart failure was the most frequent comorbidity in patients hospitalised because of ECOPD (52.6%).
Conclusions This analysis shows that, in a real-life setting, ECOPD hospitalisations are prevalent, complex (particularly in relation to heart failure), repetitive and associated with significant in-hospital mortality.
Abstract
In a real-life setting, COPD hospitalisations are prevalent, complex (particularly in relation to heart failure), repetitive and associated with significant in-hospital mortality https://bit.ly/3zCP2ZC
Introduction
Chronic obstructive pulmonary disease (COPD) is a prevalent and complex disease, associated with significant morbi-mortality [1]. During the course of their disease, COPD patients often present episodes of symptom worsening, known as exacerbations (ECOPD), that sometimes require hospitalisation [2–5]. These ECOPD episodes are in themselves heterogeneous and can be mimicked and/or aggravated by coexisting multimorbidity [6–8]. As a result, it is difficult to ascertain from standard hospital registries (and discharge diagnoses) the burden of hospitalisations due to ECOPD and the characteristics of these patients in a real-life setting. On the other hand, information available from randomised clinical trials (RCTs) is not generalisable because RCTs study highly selected populations [9].
New technologies such as natural language processing (NLP) and different artificial intelligence (AI) techniques allow the analysis of very large populations of patients in real-life. Savana Manager is a platform able to analyse free-text and interpret the content of electronic medical records (EMRs), regardless of the specific clinical information system used in each hospital [10]. Based on our previous experience using NLP [11–13], in this study we sought to describe the characteristics of COPD patients requiring hospitalisation for any acute condition (including but not limited to ECOPD in order to have a wider perspective) and/or ECOPD (as a specific discharge diagnosis or cause of in-hospital death) identified by the Savana Manager platform over the past decade in our community.
Methods
Study design and ethics
This retrospective, observational, noninterventional study used the Savana Manager version 3.0 platform to capture data from free-text in the EMRs registered in the public healthcare system (SESCAM) of Castilla-La Mancha (Spain) from 1 January 1 2011 to 1 March 2020. After this date, hospital admissions were significantly affected by the coronavirus disease 2019 (COVID-19) pandemic, so we decided to censor the analysis then. To avoid missing data from COPD patients hospitalised for ECOPD but coded with a different diagnosis, we first included in the analysis all patients over 40 years old with a diagnosis of COPD in the SESCAM database who required acute hospitalisation for any cause during the study period; thereafter, we compared these results with those observed in patients hospitalised because of the specific diagnosis of ECOPD. The SESCAM database includes information on specialised care institutions (hospitalisation, emergency care and outpatient consultations) and primary care clinics in the entire community of Castilla-La Mancha. This study follows the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines for observational studies [14] and was approved by the Guadalajara regional Research Ethics Committee.
Data analysis
Savana Manager is a data extraction system based on AI (NLP) and “big data” techniques [10]. This system can extract unstructured clinical information from natural language or free-text in EMRs and transform it into usable information for research, while maintaining patient anonymity. In addition, using computational linguistic techniques, the clinical context is detected and scientifically validated with the SNOMED CT (Clinical Terms) tool [15].
Data management and protection
The information services of each hospital were responsible for processing and anonymisation of the data, which was subsequently sent to Savana, so the latter never received identifiable data. Furthermore, an algorithm was used during data extraction that randomly introduced confounding information per patient, while at the same time only retrieving part of the individual information. The end result of this methodology is an anonymous database, completely dissociated for EMRs. Results, therefore, relate only to aggregated data, with no possibility of being able to identify either patients or physicians. According to the European Data Protection Authority, once an anonymous clinical record is separated from personal data, the European Union General Data Protection Regulation not longer applies.
Evaluation of the quantity and quality of the extracted information
Using EHRead technology, the free-text contained in the EMRs was analysed and processed with NLP techniques [10]. To evaluate the performance and accuracy of the Savana system to identify records that mention the main study variables, a gold standard “annotated corpus” of clinical documents (n=560) was developed where these variables were manually curated by three clinical experts. Using this corpus as the gold standard, we could then calculate the precision (P), recall (R) and F-measure of Savana's performance as [16]:
which indicate the reliability of the system to retrieve information, the amount of information that the system retrieves and the overall data retrieval performance, respectively. A true positive (tp) corresponds to a correctly identified record, a false positive (fp) to an incorrectly identified record and a false negative (fn) to a record that should have been identified but was not.
Statistical analysis
Qualitative variables are presented as absolute frequency and percentage, and quantitative variables as mean, 95% confidence interval and standard deviation.
The most common diagnoses in our study population (patients with COPD over 40 years old) are determined by the total number of patients who have been diagnosed with that specific pathology during their clinical course. The prevalence is calculated based on the total study population. Furthermore, the degree of association between the most common diagnosis and COPD was analysed by the Savana Manager version 3.0 platform with a Chi-squared test, using a correlation matrix adjusted by age and sex. Those with a significant correlation (p<0.05) are presented.
Results
Figure 1 shows the patient flow diagram of this analysis. From 1 January 2011 to 1 March 2020, the SESCAM database included 1 863 759 subjects over 40 years old with a total of 24 316 255 clinical documents. Among these subjects, 69 359 (3.72%) were diagnosed and treated for COPD at different healthcare levels of SESCAM. The precision, recall and F-measure of a diagnosis of COPD were 0.93, 0.90 and 0.91, respectively, indicating that the diagnosis of COPD was accurately detected in our population. F-values of other terms included in this analysis, such as comorbidities, ranged between 0.92 and 0.97. Table 1 presents the main demographic and clinical characteristics of these 69 359 COPD patients. Mean age was 72.9 (95% CI 72.8–73.0) years and 77.1% were male. Cardiovascular and metabolic comorbidities were often present, and most patients were prescribed inhaled therapies in different combinations (table 1).
Flow diagram of the study. EMR: electronic medical record; COPD: chronic obstructive pulmonary disease.
Demographics and main clinical characteristics of the total population of chronic obstructive pulmonary disease (COPD) patients identified in the study, those acutely hospitalised because of any medical condition (including exacerbation of COPD (ECOPD)) and those with a main hospital discharge diagnosis of ECOPD
Acute hospitalisations due to any cause
Among the 69 359 COPD patients identified in the SESCAM database, 26 453 patients (38.1%) were hospitalised (at least once) during the study period (10 years) because of an acute condition, as identified by the discharge diagnosis in the database (table 2). Hospitalised COPD patients were slightly older (mean 76.2 versus 72.9 years) with a higher prevalence of males (85.1% versus 77.1%) than in the total population (table 1). However, the prevalence of comorbidities was similar to that seen in the population of COPD patients at large. Interestingly, the proportion of patients receiving chronic inhaled therapies was nominally lower in hospitalised patients (table 1). Their mean hospital stay was 5.7 days. In total, these patients generated 56 972 hospitalisation events during the study period (mean 2.15 per patient), with a mortality rate of 4.7% per hospital admission. Table 2 shows that the two most frequent diagnoses at discharge in this population were respiratory infection (51.7%) and heart failure (38.1%). Finally, table 3 shows that most of these patients (47.8%) were hospitalised in general medical (internal medicine) wards, 20.1% in pulmonology wards and 11.5% in geriatric wards. Patients admitted to the pulmonology ward tended to be younger and suffer heart failure less often (table 3).
Diagnoses (at discharge or death) in chronic obstructive pulmonary disease (COPD) patients hospitalised because of any cause or exacerbations of COPD (ECOPD)
Demographics and main diagnosis at discharge of chronic obstructive pulmonary disease (COPD) patients requiring acute hospitalisation during the study period due to any cause, by hospital service attending them (n=21 007#)
Hospitalisations due to ECOPD
ECOPD was identified as the main diagnosis at hospital discharge or as a cause of death during hospitalisation in 8331 patients (31.5% of all COPD patients hospitalised during the study period). The distributions of age, gender and comorbidities were similar in the two hospitalised COPD patient groups, except for a higher prevalence of heart failure in patients hospitalised because of ECOPD (52.6% versus 38.0%, respectively). Of note, these latter patients received inhaled treatment more frequently than the former (table 1). Patients hospitalised because of ECOPD generated a total of 19 674 hospitalisation events (mean 2.36 admissions per patient), with a mean hospital stay of 6.1 days and a hospital mortality rate of 3.11% per hospital admission. Table 2 shows the main diagnoses (besides ECOPD) in this population.
Discussion
There are three main observations of this real-life analysis that spans over a decade in the community of Castilla-La Mancha in Spain. 1) The prevalence of diagnosed and treated COPD in the community was very low (3.72%), and diagnosed patients are older (72.9 years) and predominantly males (77.1%). 2) Hospitalisation events for acute conditions (including ECOPD) in this population are frequent (38.1%), recurrent (2.15 per patient) and associated with significant in-hospital mortality (4.7% per hospital admission). Of note, hospitalised patients appear undertreated before hospitalisation compared with the population of COPD patients at large. Furthermore, only a few hospitalised patients (20.1%) are treated by pulmonary specialists. 3) The prevalence of heart failure is particularly high in patients hospitalised because of ECOPD (52.6%).
Previous studies
Hospitalised ECOPD episodes worsen the health status and prognosis of COPD patients, and generate a high economic burden for healthcare systems [17]. Because ECOPD are heterogeneous and can be mimicked and/or aggravated by coexisting comorbidities [8], their characteristics and true impact on healthcare systems is not precisely defined [18–20]. A previous study in another region of Spain (Catalunya), based on diagnosis at discharge, showed that 23% of patients hospitalised for the first time because of ECOPD died within 1 year after hospital discharge and that in the remaining patients all-cause mortality was related to the number of re-hospitalisations, particularly early (<30 days) readmissions [21]. To the best of our knowledge, the current study is the first to investigate hospitalisations in COPD using AI and NLP techniques [10].
Interpretation of novel findings
We found that the prevalence of COPD in our community was 3.72%. This figure is much lower than that reported recently in an epidemiological study in the Spanish general population over 40 years old (11.8%) [22]. However, this same epidemiological study reported an underdiagnosis of COPD of 74.7% and, applying this proportion of underdiagnosis to our community, we would expect to find a prevalence of COPD in our community of 3.99%, which is actually quite close to the observed figure. Hence, this real-life analysis confirms that underdiagnosis of COPD in Spain (and likely elsewhere) is very high. Similarly, we observed that diagnosed patients are quite old (72.9 years) and predominantly males (77.1%). This supports recent calls to diagnose and treat younger patients, both males and females, with COPD [23, 24].
Our results also show that episodes of hospitalisation for any cause (including ECOPD) in COPD patients in our community are frequent (38.1%), recurrent (2.15 per patient) and associated with significant in-hospital mortality (4.7% per hospital admission). This illustrates the associated disease burden and impact of COPD on the healthcare system. Of note, we also observed that, compared with the population of diagnosed COPD patients at large, those hospitalised for any cause appear undertreated before hospitalisation (table 1). Furthermore, only a few hospitalised patients (20.1%) are treated by pulmonary specialists. These two observations open a window of opportunity to improve the care of these patients, both before and during hospitalisation [25].
Heart failure was the most frequent diagnosis in patients hospitalised because of ECOPD (52.6%). These observations support previous studies showing the clinical relevance of concomitant heart failure in patients hospitalised because of ECOPD [26–35].
Finally, in 26% of patients hospitalised because of ECOPD, pneumonia was also diagnosed. Whether this should be considered a different event from an ECOPD or included in the same concept is arguable [32, 36].
Strengths and potential limitations
The size of the population studied and the use of novel analysis methodologies are clear strengths of this study. Furthermore, the fact that this is a real-life study and that it extends over a decade are novel aspects. However, such a large size of the study population limits a more granular analysis of some clinical variables of potential interest, such as the specific type of respiratory infection that may have triggered the ECOPD event or the impact of treatments received during hospitalisation. Likewise, we acknowledge that our analysis is based on diagnostic codes and not on spirometric measurements, which may underestimate the true prevalence of the disease in the population.
Conclusions
This study in a large population studied in a real-life setting shows that: 1) there is a huge underdiagnosis of COPD in our community; 2) COPD is diagnosed too late and rarely in females; 3) hospitalisation events for acute conditions (including ECOPD) in this population are frequent, recurrent and associated with significant in-hospital mortality; and 4) heart failure is particularly prevalent in patients hospitalised because of ECOPD.
Footnotes
Provenance: Submitted article, peer reviewed.
Conflicts of interest: All authors declare no conflicts of interest.
Support statement: Supported by the Chair of Inflammatory Diseases of the Airways, University of Alcalá (Alcalá de Henares, Spain). Funding information for this article has been deposited with the Crossref Funder Registry.
- Received March 22, 2022.
- Accepted June 6, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org