Idiopathic pulmonary fibrosis (IPF) has an unpredictable course and prognostic factors are incompletely understood. We aimed to identify prognostic factors, including multidimensional indices from a significant IPF cohort at the Bristol Interstitial Lung Disease Centre in the UK.
Patients diagnosed with IPF between 2007 and 2014 were identified. Longitudinal pulmonary physiology and exercise testing results were collated, with all-cause mortality used as the primary outcome. Factors influencing overall, 12- and 24-month survival were identified using Cox proportional hazards modelling and receiver operating characteristic curve analysis.
We found in this real-world cohort of 167 patients, diffusing capacity for carbon monoxide (DLCO) and initiation of long-term oxygen were independent markers of poor prognosis. Exercise testing results predicted 12-month mortality as well as DLCO, but did not perform as well for overall survival. The Composite Physiological Index was the best performing multidimensional index, but did not outperform DLCO. Our data confirmed that patients who experienced a fall in forced vital capacity (FVC) >10% had significantly worse survival after that point (p=0.024).
Our data from longitudinal follow-up in IPF show that DLCO is the best individual prognostic marker, outperforming FVC. Exercise testing is important in predicting early poor outcome. Regular and complete review should be conducted to ensure appropriate care is delivered in a timely fashion.
DLCO is a powerful prognostic marker in IPF http://ow.ly/EaEr307VTRN
Idiopathic pulmonary fibrosis (IPF) is the most common and most aggressive of the idiopathic interstitial pneumonias. It is a distinct clinical entity, associated with unexplained chronic exertional dyspnoea, cough, bibasal crackles on auscultation of the chest and finger clubbing, with radiological and histopathological appearances consistent with usual interstitial pneumonia (UIP) .
The Bristol Interstitial Lung Disease (ILD) Service at North Bristol NHS Trust manages patients with IPF from around the south west of England. We performed a retrospective analysis of all patients diagnosed with IPF between 2007 and 2014. Our hypotheses were that exercise testing, and exertional desaturation in particular, significantly improves prognostication and that significant changes in lung function are detected with follow-up intervals of <4 months. In addition, we sought to compare published multidimensional indices for outcome prediction in this cohort and to compare the clinical course of patients following a decline of >10% in forced vital capacity (FVC) with those not suffering such a decline.
Our understanding of the natural history of IPF is limited by its highly variable clinical course and a lack of large-scale epidemiological studies . Median survival in IPF ranges from 2.5 to 3.5 years [1, 2]; however, the clinical course of individual patients is highly variable.
Symptoms are recognised to precede diagnosis by 1–2 years . More recently, incidental detection of subclinical IPF has become more frequent, resulting from thoracic imaging performed for other reasons. This has led to a wide range of reported mean and median survival estimates between 2 and 4 years from time of diagnosis [4–6]. There is a lack of consensus, and considerable debate surrounding appropriate outcome measures to assess treatment response in clinical trials [7–9] and to aid prognostication on routine clinical review.
The majority of studies examining prognostic indicators in IPF have been retrospective, or from clinical trial cohorts, which may be limited by their exclusion of those with more advanced disease [10–14]. Their findings have been heterogeneous, with conflicting data as to the most valuable variables in predicting outcome. Additionally, those studies examining the value of pulmonary function measures, including FVC, have not examined exercise testing variables in parallel [15, 16].
Several groups have developed multidimensional indices to help outcome prediction in IPF [10–12, 17]. No single scoring system has yet been widely accepted and these indices have not been compared in an independent cohort. The most accepted criterion for clinical trials remains a change in FVC.
Clinical guidelines have highlighted the lack of data to inform the ideal interval between clinical review for these patients . In the UK there has been controversy surrounding the application of a “stopping rule” for use of novel antifibrotic agents following decline in FVC of >10%, and restriction of these new drugs based on FVC criteria [19, 20].
Patient and data identification
Patients were identified from review of the Bristol ILD Service multidisciplinary (MDT) team database and clinical records from September 1, 2007 to December 31, 2014. Inclusion criteria for this study were an MDT consensus diagnosis of IPF according to American Thoracic Society (ATS)/European Respiratory Society (ERS) criteria  made between these dates and ≥12 months of follow-up. Diagnoses made before publication of the 2011 criteria were reviewed at an MDT meeting for confirmation. Patients with a “working diagnosis” of IPF were diagnosed as such based on clinicoradiological parameters in the context of MDT discussion. Where a confident diagnosis could not be made, patients were referred for surgical lung biopsy following discussion of the risks and benefits of such an approach and considering the clinical condition of the patient, in accordance with national guidelines . Ethical approval for this work was given by the East of England research ethics committee (reference 15/EE/0023).
Demographic data were collected in addition to all-cause mortality and use of pirfenidone. Computed tomography of the chest was classified as “definite” or “possible” UIP pattern. Where echocardiography was performed around the time of the first clinical assessment, incidence of pulmonary hypertension was noted. Initiation of long-term oxygen therapy (LTOT) was documented at the time of the first visit. This was initiated in accordance with national guidelines only for those patients with arterial oxygen tension of <7.3 kPa or <8 kPa in the presence of evidence of pulmonary hypertension . Comorbidities, specifically those including past or current history of cancer or cardiac disease, were noted.
Pulmonary function and exercise testing were all performed in the same physiology department according to international criteria [22, 23]. Results from all spirometry, diffusing capacity of the lung for carbon monoxide (DLCO) and 6-min walk testing (6MWT) were collated over the follow-up period. Relative changes in FVC and DLCO were calculated, as suggested by Richeldi et al. . Change in FVC and DLCO on follow-up were noted, in addition to subsequent trends in spirometry results. The minimal clinically important difference (MCID) for IPF of 5% was used . Desaturation was defined as a fall in pulse oximetry on exertion of ≥4% to a level of <88%. Disease progression was defined as death, respiratory hospitalisation or a fall in FVC of >10%.
The indices compared were the Composite Physiological Index (CPI) , the Gender, Age, Physiology (GAP) score  and the distance–saturation product (DSP) . The score described by du Bois et al. , which incorporates longitudinal change in physiology values, was also assessed. These indices were selected based on the diversity of their underlying methodologies and the variables incorporated. Scores were calculated according to the methodology described by the authors.
All data were tested for normality and are reported as mean (95% CI) or median (interquartile range).
An a priori statistical plan was approved, using a multivariable, proportional hazard Cox model with backward selection to determine which individual variables were independently predictive of survival from both baseline variables and after 12 months of follow-up. Separate analyses were performed for the cohort with follow-up values available, including for comparison of the du Bois multidimensional index. The full model contained all variables with a univariable p-value <0.2. Variables with a p-value >0.1 were removed stepwise with model comparison by Bayesian information criteria (BIC). Data were assessed for multicollinearity and no variables were of concern. The proportional hazards assumption was checked by testing for a non-zero slope in a generalised linear regression of the scaled Schoenfeld residuals on functions of time.
Individual variables and multidimensional indices were compared using Cox univariable analysis (BIC). Early poor outcome was assessed by comparison of the area under the receiver operating characteristic (ROC) curve (c-statistic) analysis for 12- and 24-month survival (for continuous variables). Multiple survival analysis methodologies were used to ensure robust comparison of scoring systems. Statistical analyses were conducted using Stata software (Stata/IC v14.1; StataCorp, College Station, TX, USA).
167 patients were identified in the study period with ≥12 months of follow-up and included in the analysis. Demographic details are shown in table 1. Average follow-up was 22.6 months (range 12–87 months). 78% of patients had definite UIP on high-resolution computed tomography (HRCT), meaning 22% of patients had a “working diagnosis” of IPF, according to ATS/ERS criteria. Eight of these patients were diagnosed with IPF following surgical lung biopsy. The remaining patients were not deemed sufficiently robust to undergo this procedure, or declined biopsy following discussion. Pulmonary hypertension was observed in 18.6% of patients and this was associated with a significantly lower DLCO (between-group difference 10.98%, p=0.01). Overall mortality was 40.1%, with 13.8% 12-month mortality and a median survival of 35.4 (16.5–65.5) months. Repeat pulmonary physiology testing at 12 months was available for 126 patients.
Survival analysis from individual variables
Univariable Cox proportional hazards modelling identified pulmonary hypertension, forced expiratory volume in 1s (FEV1)/FVC ratio, FVC, DLCO, inability to complete DLCO test, 6-min walking distance, exertional desaturation, baseline initiation of LTOT, interval change in DLCO and interval change in walking distance as having a significant effect on survival (p<0.05) (online supplementary table S1). Median survival after initiation of LTOT was 17.4 months (11.3–21.8), compared to 48.5 months where oxygen was not required (p<0.001 by log rank test) (figure 1). Pirfenidone initiation was associated with a worse survival by univariable proportional hazards modelling (hazard ratio 1.99, p=0.02), with a median survival of 20.4 months, compared to 44.7 months where this was not given. There was no difference in the rate of FVC decline >10% with pirfenidone use in this cohort (p=0.43).
On multivariable analysis using only baseline variables with p<0.2, male sex and DLCO were significant in the full model. The best performing model as assessed by BIC contained FVC, sex, DLCO, baseline initiation on LTOT and an inability to complete DLCO assessment (table 2).
When 12-month follow-up data were included in a multivariable model containing all variables with p<0.2, only baseline DLCO remained significant (table 2). The best performing model contained sex and baseline/change in DLCO.
Survival analysis from multidimensional indices
Baseline values and survival analysis results for CPI, GAP, DSP and du Bois scores are shown in table 3. Comparison of Cox models by BIC demonstrated that univariable models containing DLCO or CPI were the best performing. Using only FVC was worse than any of the multidimensional indices.
The du Bois prognostic score includes 12-month follow-up variables. When comparing this within the cohort with follow-up data available, the best-performing univariable Cox models remained the DLCO and CPI.
Analysis of the area under ROC curves for both 12- and 24-month mortality, to investigate prediction of early poor outcome, showed that FVC performed worse than DLCO or CPI (table 4). It was no better at predicting early poor outcome than age alone. Exercise capacity, as measured by 6MWT, had equivalent performance to DLCO for 12-month mortality, but this was not seen at 24 months.
Overall median progression-free survival was 21.8 (11.1–53.5) months. Univariable Cox proportional hazards analysis showed that age, detection of pulmonary hypertension, LTOT initiation, exertional desaturation, walking distance and DLCO or inability to complete DLCO were significant variables (online supplementary table S2). Following multivariable analysis, age, LTOT initiation and DLCO remained significant.
Of the multidimensional indices, CPI, GAP and DSP were significantly predictive of disease progression on Cox proportional hazards analysis, with no substantial differences in model performance as compared by BIC. These indices did not perform better than individual variables of oxygen use and DLCO; however, they were superior to FVC.
Clinical course after FVC decline >10%
89 patients had ≥24 months of follow-up, providing sufficient data for analysis of their clinical course after FVC decline of >10% in a 12-month period. Median follow-up following the 12-month result for this cohort was 18.6 (11.1–32.5) months. Median survival by log rank comparison was significantly worse for those who experienced FVC decline >10% (p=0.024) (figure 2). However, the “new” baseline for this group was significantly lower following decline for both FVC (68.4% versus 81.6%, p=0.001) and DLCO (42.3% versus 53.5%, p=0.003). After a fall in FVC of >10%, patients went on deteriorate at a greater rate (4.5% per year versus 0.4% per year, p<0.001) (figure 3).
860 individual clinic visits were recorded from our cohort in the study period. Median interval of follow-up between visits was 4.1 (3.1–6.1) months, with a median change in FVC between visits of −1.1% (−5.5–2.7%). 27.6% of recorded values had fallen by more than the MCID (>5%) from the previous visit (22.6% in those with follow-up <4 months, 32.3% with follow-up >4 months; p<0.001). Change in FVC was significantly, but weakly, correlated with interval of follow-up by (Spearman correlation co-efficient −0.135, p<0.001) (figure 4). Where a decline of >5% in FVC was observed, this was significantly related to the interval of follow-up (p<0.001 by Mann–Whitney U-test).
These data show that best prediction of poor outcome in IPF comes from DLCO. FVC, despite its widespread use to guide prescribing and in clinical trials, does not perform well as a marker for prognosis, early poor outcome or progression-free survival. LTOT initiation is also a particular marker of poor prognosis, with a median survival of <18 months after this event. Multidimensional indices are predictive; however, only the CPI performs as well and as consistently as DLCO. Exercise testing variables, including exertional desaturation, are good markers for early poor outcome and perform as consistently as the multidimensional indices tested.
In addition, we have demonstrated that following a fall in FVC of >10%, further deterioration does occur and this is a marker of worse survival. This may be accounted for by the association of FVC fall with lower DLCO values. In order to detect these changes in pulmonary function in a timely fashion, regular follow-up is required; over 20% of occasions at which clinically significant decline in FVC had occurred were at <4 months’ interval since the previous visit. While we cannot generalise from this limited cohort study, it would seem in this cohort that regular review at an interval more frequent than 6 months is needed to promptly detect clinically significant changes. Approaches such as home spirometry  may enable monitoring of patients without the need for clinic visits in the future, but this requires further investigation.
These data are derived from a well-characterised, real-world cohort of IPF patients, with significant follow-up. We were able to conduct analyses both at baseline and after 12 months of follow-up in a significant number of patients. We have assessed the comparative strengths of indices for which no previous head-to-head comparison has been performed in an independent population. The use of a variety of methodologies gives strength to our conclusions, with consistency of results between the different approaches to survival analysis.
Our work also addresses an important question highlighted by guidelines, reporting the pattern of changes in FVC seen for different follow-up intervals. We have been able to observe the nature of FVC changes following a decline of 10%, which should inform policy decisions related to eligibility criteria for novel antifibrotic agents.
Our data are limited by their retrospective nature and resultant heterogeneity and attrition to follow-up. However, this is unavoidable in this area when attempting to gather such information about the real-world experience of IPF. In addition, there are issues related to the period covered by this study, during which antifibrotic treatment with pirfenidone became available. The group given this drug appeared to have a worse survival; however, we have interpreted this to reflect the eligibility criteria for pirfenidone in England, limiting its use to those with more severe disease (FVC 50–80% predicted) . There did not appear to be any significant difference in the rate of FVC decline between the treated group and the remaining patients in this small cohort. It must be emphasised that this cohort was a real-world group and therefore not analogous to the groups used in randomised controlled studies of antifibrotic medications.
There has been controversy surrounding the use of prognostic indicators in clinical trials in IPF [8, 9]. Some have criticised the use of FVC, advocating the use of all-cause mortality or hospitalisation. There is heterogeneity in the findings of published studies exploring which variables are robust prognostic indicators, despite consistent methodologies.
A complete overview of all such studies is beyond our scope here and readers are directed to review articles [27, 28]. These studies have been conducted in cohorts from North America and South East Asia, in contrast to the demographics of the work presented here. This difference in population make-up may explain some of the contrasts in our findings. Where studies have examined pulmonary function measures as prognostic variables, they have not included exercise testing results [15, 16]. Where such results have been included, previous work has indicated the prognostic importance of both distance walked and desaturation during the 6MWT [14, 25, 29–31]. There is a consistent signal for DLCO, walking distance and change in FVC to be useful metrics; however, this is not uniform.
Our data support the routine use of exercise testing in clinical assessment of IPF and also lend weight to the use of DLCO. There are concerns surrounding the heterogeneity of test performance, limiting the use of these measures in clinical trials; however, this should not prevent its application as a prognostic indicator for IPF.
In IPF, studies have shown only moderate correlation between exercise capacity, dyspnoea on exertion and pulmonary function measurements [32, 33]. This highlights the multifactorial contribution to the symptoms of exertional dyspnoea and exercise capacity. Similarly, it has been observed that pulmonary function measurements do not robustly correlate with the degree of parenchymal involvement on HRCT [34–36]. A well-validated staging system in IPF should improve prognostication in disease, improve future study design and also facilitate decision making around medications and lung transplantation.
Several groups have developed multidimensional indices for IPF [10–12, 17, 25]; however, no single scoring system has yet been widely accepted. In other diseases, multidimensional indices have been superior to individual predictors; the BODE (body mass index (BMI), airflow obstruction, dyspnoea and exercise capacity) index in chronic obstructive pulmonary disease includes BMI, FEV1/FVC ratio, 6-min walking distance and Medical Research Council dyspnoea score , while the CURB-65 score (confusion, urea >7 mmol·L-1, respiratory rate ≥30 breaths·min-1, blood pressure <90 mmHg (systolic) or ≤60 mmHg (diastolic), age ≥65 years) is widely used in community acquired pneumonia . Both of these indices are superior to individual components in isolation.
The ideal multidimensional index for IPF would be simple, making use of routinely collected clinical data that is easy to apply in day-to-day clinical practice. The indices compared in this work all fulfil these criteria. Our data suggest that, while performing well, these indices in IPF do not yet have superior performance to isolated elements of the clinical and physiological assessment.
This study confirms the validity of the DSP, GAP, CPI and du Bois multidimensional indices in prognostication for IPF; however, our data suggest that these do not consistently outperform exercise testing and DLCO. In addition, we have highlighted LTOT initiation as a significant prognostic indicator. Our data support clinical review of IPF patients at regular intervals and confirms that FVC decline >10%, while often an isolated event, is a harbinger of poor outcome in IPF. These data should inform the development of guidelines and policy decisions related to novel antifibrotic agents.
We would like to thank Paul White (University of the West of England, Bristol, UK) for his statistical advice and Giles Dixon (University Hospitals Bristol NHS Foundation Trust, Bristol) for his help in data entry.
This article has supplementary material available from openres.ersjournals.com
Conflict of interest: None declared.
- Received August 22, 2016.
- Accepted December 23, 2016.
- Copyright ©ERS 2017
This article is open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.