Abstract
The heterogeneity of interstitial lung disease (ILD) results in prognostic uncertainty concerning end-of-life discussions and optimal timing for transplantation. Effective prognostic markers and prediction models are needed. Cardiopulmonary exercise testing (CPET) provides a comprehensive assessment of the physiological changes in the respiratory, cardiovascular and musculoskeletal systems in a controlled laboratory environment. It has shown promise as a prognostic factor for other chronic respiratory conditions. We sought to evaluate the prognostic value of CPET in predicting outcomes in longitudinal studies of ILD.
MEDLINE, Embase and the Cochrane Database of Systematic Reviews were used to identify studies reporting the prognostic value of CPET in predicting outcomes in longitudinal studies of ILD. Study quality was assessed using the Quality in Prognosis Study risk of bias tool.
Thirteen studies were included that reported the prognostic value of CPET in ILD. All studies reported at least one CPET parameter predicting clinical outcomes in ILD, with survival being the principal outcome assessed. Maximum oxygen consumption, reduced ventilatory efficiency and exercise-induced hypoxaemia were all reported to have prognostic value in ILD. Issues with study design (primarily due to inherent problems of retrospective studies, patient selection and presentation of numerous CPET parameters), insufficient adjustment for important confounders and inadequate statistical analyses limit the strength of the conclusions that can be drawn at this stage.
There is insufficient evidence to confirm the value of CPET in facilitating “real-world” clinical decisions in ILD. Additional prospective studies are required to validate the putative prognostic associations reported in previous studies in carefully phenotyped patient populations.
Abstract
There is presently insufficient evidence to confirm the value of CPET in facilitating “real-world” clinical decisions in ILD. Additional prospective studies are required to validate the putative prognostic associations reported in previous studies. https://bit.ly/3dfp5kq
Introduction
The heterogeneity of interstitial lung disease (ILD) [1, 2] presents challenges for patients and clinicians in terms of treatment choices, optimal timing of end-of-life discussions [3], or referral for transplantation [4] and clinical trial design [5, 6].
Cardiopulmonary exercise testing (CPET) provides a comprehensive assessment of the physiological changes that occur in the respiratory, cardiovascular and musculoskeletal systems during exercise, in a controlled laboratory environment [7, 8], and is considered the gold standard for evaluating maximal/symptom-limited exercise tolerance in patients with pulmonary and cardiac disease [9]. Although CPET has been available for decades, recent evidence is emerging to support its use in the prognostication of chronic cardiopulmonary disease [10, 11], with increasing interest in its application in ILD [12].
Maximum/peak oxygen consumption (VO2max or peak VO2) is a measurement of the capacity for aerobic exercise and is determined by variables that define oxygen delivery by the Fick equation [13]. In patients with ILD, limitations on exercise may be the consequence of either ventilatory mechanical limitation (by reaching their ventilatory ceiling, typically thought to be 80% of maximal voluntary ventilation (MVV)), abnormal gas exchange (or reduction in ventilatory efficiency; indicated by variables such as the increment in minute ventilation (VE) relative to carbon dioxide production (VE/VCO2)) and/or diffusion limitation (indicated by variables such as reduction in oxygenation >5% or hypoxia at anaerobic threshold (AT)/peak exercise) [13].
The primary objective of this systematic literature review was to evaluate the prognostic value of CPET in predicting disease-specific outcomes in longitudinal studies of ILD. If a prognostic role for CPET were confirmed, it could be used to guide earlier intervention for at-risk patients and support cohort enrichment for ILD clinical trials.
Materials and methods
The study protocol was prepared in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [14] and registered in the International Prospective Register of Systematic Reviews (PROSPERO 110198/2018) (study commencement date 1 November 2018, completion date 30 September 2019). In brief, eligible studies included cohort (retrospective or prospective) studies reporting the prognostic value of CPET results in adult populations of ILD.
The primary objective was to evaluate the prognostic value of CPET in predicting disease course and outcomes in longitudinal studies of ILD. We explored the relationship between CPET and a broad range of relevant clinical outcomes including, but not limited to, relevant disease outcomes (e.g. death, hospitalisation), potential surrogates of disease severity (e.g. worsening lung physiology, etc.), and future deterioration in health-related quality of life (HRQoL) and/or functional status. Where possible, a comparison of the prognostic value of CPET was made across different ILD subtypes.
Studies were excluded if an ILD cohort was not described and reported separately. Non-original research publications and abbreviated reports were excluded. Randomised controlled trials (RCTs) were excluded as we did not expect this to be an appropriate methodological design for assessing the prognostic value of CPET. An amendment to our originally registered protocol (English language articles only) enabled the inclusion of a relevant non-English (French) publication.
The search criteria were developed in accordance with search recommendations for systematic reviews of evaluations of prognostic variables [15]. Electronic searches were performed in MEDLINE, Embase and the Cochrane Database of Systematic Reviews (CDSR), with no publication date or language restrictions. Full details of the specific search criteria applied are presented as supplementary material 1. All titles and abstracts were screened independently by two review authors (RD and CS), and agreement was assessed using Cohen's kappa statistics [16]. Any discrepancies/disagreements were resolved by discussion between reviewers and included a third party (SLB) if necessary.
A formal systematic review management platform was not used for this study. EndNote (Alfasoft Limited) was used to facilitate the combination of multiple database results and deduplication. A standardised data extraction form was used (independently by RS and CS, with subsequent verification by SLB) to extract relevant study details from selected studies.
A meta-analysis was planned if appropriate and feasible. A narrative, qualitative data synthesis was planned if wide heterogeneity in study design and CPET analysis precluded quantitative analysis. Study quality was assessed using the QUIPS (Quality in Prognosis Study) risk of bias tool by two reviewers (RD and CS) [17], with agreement measured using Cohen's kappa (supplementary material 2).
Results
Simultaneous searches of Embase (n=573) and MEDLINE (n=373), performed on 13 April 2019, identified 946 articles. A search of the CDSR did not identify any additional studies. As anticipated, we did not identify any relevant RCTs in our study selection process and no studies were excluded on the basis of an RCT design. After removal of duplicates, 658 titles and abstracts were screened for eligibility. There was moderate initial agreement between the two reviewers (Cohen's kappa 0.462, supplementary material 3). Discordance for 20 studies was due to a single non-clinically trained reviewer choosing to include questionable studies for consideration (all of which were easily resolved through discussion and subsequently excluded). Due to the nature of discordance, it was not felt that retraining reviewers and formally repeating the title and abstract selection process would benefit the review process.
Following full text review, 13 studies were eligible for full data extraction (figure 1).
Study selection flow diagram presented according to PRISMA statement. CDSR: Cochrane Database of Systematic Reviews; CPET: cardiopulmonary exercise testing.
Study design
Table 1 summarises the study design and reported findings of the final 13 studies.
Study characteristics of papers selected for full data extraction
The majority were retrospective cohort analyses (11/13, 85%), with variable follow-up (the majority <4 years [7, 18–25], one study with 5 year follow-up [26]; range 23 days–20 years follow-up [19, 27]).
There were two prospective studies [28, 29]. One investigated the relationship between CPET and survival characteristics in IPF with variable follow-up between 9 and 64 months [28]. The other used CPET as part of a wider investigation into the role of exercise testing in the prognostication of ILD and followed patients for a fixed period of 40 months [29].
Patient populations
The majority (8/13, 62%) exclusively recruited patients with IPF, with retrospective assessment of 703 IPF patients and prospective assessment of 59 IPF patients. Classification of IPF was based on accepted criteria used at the time of enrolment: the 2000 American Thoracic Society (ATS) international consensus (IC) statement [1, 20, 21, 23, 24, 28] and the later 2002 ATS/ERS (European Respiratory Society) IC classification of the idiopathic interstitial pneumonias (including IPF) [19, 22, 30]. The updated 2011 ATS/ERS/JRS/ALAT guidelines for the diagnosis of IPF [31] were applied in all [7, 18, 28, 29] but one of the studies [20] published after 2011 (the latter was a retrospective study that may have recruited patients prior to the 2011 guidelines).
Two retrospective studies explored the prognostic role of CPET in 144 histologically confirmed sarcoidosis patients [25, 26], representing Scadding disease stages 1–4 [32]. Only one retrospective study had examined the prognostic role of CPET in systemic sclerosis ILD (SSc-ILD) (n=83) [27]. Patients with SSc met classification criteria adopted by the 1980 American Rheumatology Association [33] and those with SSc sine scleroderma met criteria proposed by Poormoghim and colleagues [34]. A diagnosis of ILD was based on chest radiography in 60/83 patients [27].
The prognostic role of CPET in other secondary causes of ILD (such as myositis, occupational causes of ILD and hypersensitivity pneumonitis (HP)) and/or other forms of idiopathic interstitial pneumonias (IIP) has not been well studied. Two retrospective studies have reported the prognostic value of CPET in mixed ILD populations referred for lung transplantation [7, 19], but low patient numbers precluded useful subgroup analyses.
The majority of studies had a moderate (6/13, 46%) or high (4/13, 31%) [7, 19, 21, 25] risk of bias for participant selection. For example, generalisability in one study was limited by the lack of clearly defined clinical characteristics (e.g. Scadding disease stage) in patients followed longitudinally (102/149) [30]. Studies enrolling from populations referred for lung transplantation resulted in selected cohorts of advanced ILD patients [7, 19]. Others incorporated a priori patient grouping, for example the presence of pulmonary hypertension (PH) [18], to enrich populations with those at higher risk of outcomes of interest, or actively excluded relevant patients, e.g. those that died from a cause other than respiratory failure [21].
Study attrition was generally low, consistent with the retrospective nature of the majority of studies. The QUIPS risk of bias for study attrition was high in two studies. Over 25% of patients were excluded from the analyses by Lopes et al. [26] (due to smoking history, concomitant respiratory disease, cardiac disease and neuromuscular disease). In another study, 34% (80/238) of the original study population were excluded from the analysis because of incomplete data sets [23].
Prognostic factor measurement
CPET was the sole prognostic factor for the majority of studies (8/13, 62%), with a minority using CPET as part of a broader repertoire of exploratory physiological tests including 6MWT [7, 19, 28] or lung function parameters [18]. One study incorporated CPET with other clinical, radiological and resting physiological assessments to devise a scoring system to predict survival in newly diagnosed cases of IPF (the clinical–radiological–physiological score) [23].
In two studies, CPET was the principal method of achieving maximal exercise [25, 27], with arterial blood gas sampling or peripheral oxygenation measurements used to determine the effect of exercise on gas exchange. In both studies, typical CPET measures, such as peak VO2, were not reported.
The bias rating for prognostic factor measurement using the QUIPS tool was generally low to moderate (figure 2), with the majority of studies reporting a standardised approach to CPET (albeit individualised for each study) and analysis that would be easily reproducible and not amenable to bias. Most studies provided a sufficient description of the CPET protocol used (6/10, 60%), adhering to the 2003 ATS statement on CPET testing [7, 18–20, 22, 28]. Variance in the use of supplemental oxygen during CPET was observed; oxygen usage was an inclusion criteria in one study [7], whilst in others, supplemental oxygen was applied variably, depending on a pre-study requirement for home oxygen or saturation on room air <90% [19]. In 7 of 13 (54%) studies, blood gas analysis was used to assess the adequacy of gas exchange during exercise [21–27], whilst the remainder used pulse oximetry, considered by some experts to be a suboptimal substitute [13]. A broad range of quantitative CPET parameters were presented/analysed (summarised in table 1), raising the possibility of reporting bias (see later).
The Quality in Prognosis Study (QUIPS) risk of bias tool assessment of included studies. Green indicates low risk of bias, amber indicates a moderate risk and red indicates a high risk of bias.
All but one study used cycle ergometry. Treadmill exercise testing was used as the method of CPET in the remaining study, with exercise increments based on a patient's daily activities and parameters of resting pulmonary function; this raises concerns regarding variation amongst subjects [21]. Furthermore, inherent differences in physiological responses recorded by the two ergometers during incremental exercise have been well defined, and make direct comparison of the two methods problematic [35, 36].
Outcome measurement
Eleven of 13 studies (85%) evaluated mortality. The majority of these (10/11, 91%) examined all-cause mortality, considering death or lung transplantation as composite endpoint. The remaining study used an outcome measurement that was restricted to respiratory deaths only [21]. One study assessed the discriminatory ability of CPET to identify patients who would die on the lung transplant list before receiving transplantation [19]. Other outcomes included interceding PH [18], decline in pulmonary function (FVC, forced vital capacity), decline in DLCO (diffusion capacity for carbon monoxide) and/or duration of immunosuppressive therapy in sarcoidosis [25, 26].
The risk of bias for outcome measure assessment was considered low to moderate across all studies (fig 2).
Reported prognostic associations of CPET in ILD
All studies reported at least one positive association between CPET and clinical outcomes, raising the possibility of positive reporting bias. A summary of the main findings is presented in table 2. Significant heterogeneity in study design, study populations (and classification criteria adopted), CPET protocols, CPET endpoints and defined endpoints precluded a meta-analysis.
Reported associations between CPET parameters and outcomes in studies of ILD
Maximal oxygen consumption
The prognostic value of measures of maximal oxygen consumption during CPET on ILD outcomes have been reported in 10/13 (77%) studies (table 2).
Peak VO2·kg−1 inversely correlated with increased 1-year mortality in two cohorts of patients with severe ILD referred for lung transplantation [7, 19], whilst peak VO2 thresholds ranging from <8.3 to <14.2 mL·kg−1·min−1 [24, 28, 29] were reported to predict mortality in IPF. These results contrasted with the findings of other studies that failed to identify any significant association [20–22].
Ventilatory efficiency
The prognostic value of the ventilatory equivalent for CO2 at AT (VE/VCO2 at AT) at levels ranging between >34 and >46 was reported to predict survival in IPF [7, 19, 20], even after correcting for functional severity of ILD [20] (table 2).
The ventilatory equivalent for oxygen at AT (VE/VO2 at AT) was also reported to be a poor predictor of survival in IPF patients [21, 22] and whilst VE/VO2 was associated with worse IPF survival in the derivation cohort of the clinical–radiological–physiological multimodal score, even after adjustment for age and smoking status, it was not included as a parameter in the final model [21].
Diffusion limitation or exercise-induced hypoxaemia
Exercise-induced hypoxaemia was reported as a potential prognostic factor for survival in IPF [21, 23]. PaO2 at the end of maximal exercise was the only CPET-derived parameter included in the comprehensive clinical–radiological–physiological multimodal score predicting survival in IPF, and when weighted, accounted for as much as 10.5% of the maximum score in the final model [23].
In mixed populations of ILD patients with advanced disease and referred for lung transplantation [7, 19], desaturation during CPET was reported to be predictive of lung transplantation or death.
In two studies examining longitudinal outcomes in sarcoidosis, the alveolar–arterial oxygen pressure gradient during exercise P(A-a)O2 (a measure of arterial desaturation during exercise) was independently associated with both the need for prolonged (>1 year) immunosuppressive therapy [25] and decline in pulmonary function at 5 years [26].
Finally, in the single study of SSc-ILD [27], akin to studies of sarcoidosis and IPF, diffusion limitation, measured in this study as the change in peripheral oxygenation (SpO2) during CPET, correlated with survival.
Study confounders
The majority of studies were considered to be at “high” risk of bias due to inadequate account of potential confounding factors or methods of statistical analysis/reporting (fig 2). The data used in the majority of studies was obtained from existing databases and/or case note review (85%, n=11) and as such, the contribution of potential important confounders such as comorbid disease [18, 21, 22, 24, 26, 27], body mass index [19–21, 24, 26, 28] and smoking status [18, 19, 22, 26] was not recorded. Baseline “disease severity” was only specifically addressed as a potential confounder by one study [20]. The use of variable levels of supplemental oxygen (or uncertain inspired oxygen concentrations) in some of the reviewed studies [7, 19] is also a major limitation that potentially impacts on the accuracy of peak VO2.
As discussed previously, studies reporting outcomes in subjects referred for transplantation reduces the generalisability of the study findings [7, 20], selecting cohorts of more advanced ILD patients. Other studies focused on healthier populations of ILD patients (e.g. not requiring supplemental oxygen during CPET), and this unsurprisingly resulted in lower mortality rates (n<10) [24, 28, 29].
Multiple logistic regression (MLR) was the dominant statistical methodology used to determine the relationship between CPET parameters and clinical outcomes in ILD. Whilst this approach adjusts for the effects of known confounders, most of the study sample sizes were smaller than the proposed minimum requirement for MLR analysis [37]. Only one study reported an a priori power calculation to influence sample size [29]; others were underpowered to detect the outcomes proposed.
Stepwise multiple regression was used by some studies to determine the optimal model parameters to predict increased mortality [23, 28]. This approach uses parameter inference, which may lead to over-fitting of some parameters or exclusion of confounders that do not reach statistical significance [38]. Furthermore, the number of parameters or order entry (or deletion) can also affect the selected model [39] and affect the likelihood of type I error [38]. Only one study specifically attempted to reduce multicollinearity [23], which if overlooked can increase the risk of type II error [40].
Discussion
Clinicians would benefit from reliable prognostic markers for patients with ILD to enable timelier referral for transplantation, improved monitoring of existing therapies, and to determine the efficacy of novel treatments in clinical trials [12, 41].
To our knowledge, this is the first study to systematically review and critically appraise studies that have reported the prognostic value of CPET in ILD. Thirteen studies were identified; survival was the principal clinical outcome measured. The utilisation of numerous methodologies, CPET parameters and timing of mortality evaluation prevented the determination of definitive CPET thresholds for predicting outcomes in ILD. Due to the clinical diversity of the studies and moderate risk of bias in all studies in at least one domain of the QUIPS tool, meta-analysis was not possible. It was also felt that meta-analysis might overstate the findings of these small-scale, poorly matched studies.
There were conflicting results with regards to the prognostic role of maximal oxygen consumption in predicting survival, which may in part be attributable to the heterogeneity of the studies concerned. Whilst reductions in ventilatory efficiency have been reported to predict both the presence of PH [42] and development of interceding PH in IPF cohorts [18], an independent prognostic value in IPF patients was not determined. The magnitude of hyperventilation at ventilatory threshold does, however, warrant further exploration as a prognostic factor in ILD, particularly as a marker of concurrent cardiopulmonary vascular impairment. Exercise-induced hypoxaemia was another potential prognostic outcome reported in several studies. A study directly comparing the longitudinal prognostic value of CPET with alternative forms of exercise testing, such as 6-minute walk testing, could therefore be justified.
Issues around study design (relating primarily to the inherent problems of retrospective studies, patient selection and presentation of numerous CPET parameters), insufficient adjustment for confounding variables and inadequate statistical analyses limits the strength of conclusions that can be drawn from the studies undertaken to date. Whilst the associations presented shed important light on the potential role of CPET in disease prognostication in ILD, there is currently insufficient evidence to support its use in facilitating “real-world” clinical decisions and larger prospective studies are required. In planning future clinical studies, rigorously phenotyped patient cohorts, characterised using standardised definitions and with external validation or multicentre cohorts, will be imperative to try to overcome some of the challenges encountered by studying heterogeneous ILD populations.
Several practical challenges of CPET, including lack of measurement standardisation, non-uniform parameter availability from different instrument manufacturers, provision of adequate training of personnel, availability of equipment in secondary care, establishment of optimal exercise duration and ramping protocol, alongside individual patient safety considerations, such as desaturation to prohibitive levels in advanced ILD, will all need to be addressed prior to its consideration in clinical practice in ILD patient populations. The absence of sufficient longitudinal data to identify a minimally clinically important change in CPET values in ILDs is a further obstacle that will also need to be overcome [12, 43].
This work has identified a number of considerations for future prognostic studies of CPET in ILD. Common to many human diseases, the disease progression in ILD is probably influenced by a complex interplay of patient, genetic, environmental and treatment factors. As such, a multivariable approach to the design and analysis of future prognostic studies of ILD is essential if we are to confirm a specific role for CPET in routine monitoring. In contrast to RCTs there are no robust standards defining the need to register or publish protocols for prognostic research and as such it is not always transparent whether statistical analysis was part of the a priori plan [44]. Almost all studies in this review examined multiple prognostic CPET variables and as such there is potential for selective reporting bias that could be largely overcome by more stringent protocol registration with pre-specified outcomes of interest. It is important that relevant study confounders are taken into consideration in future studies examining the prognostic value of CPET in ILD to establish whether CPET provides additional prognostic value beyond more easily obtainable clinical and physiological outcomes.
Conclusions: take-home message
CPET may have a role as a prognostic factor in ILD but the quality of existing studies and lack of MCID values in ILDs limits the conclusions that can be drawn at present. Large carefully designed prospective studies are needed to establish the role of CPET in the longitudinal assessment of ILD in the future.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00027-2020.SUPP1
QUIPS tool 00027-2020.SUPP2
Acknowledgements
Independent statistical advice was sought from Paul White (statistician at the University of the West of England, Bristol, UK) to confirm that a meta-analysis was not possible.
Footnotes
This article has supplementary material available from openres.ersjournals.com
Author contributions: S.L. Barratt is the guarantor of the content of the manuscript, including data and analysis. C. Sharp and R. Davis undertook the initial literature review and data extraction. This was then verified by S.L. Barrett and J.D. Pauling who prepared the manuscript. R. Davis and C. Sharp verified the manuscript.
Conflict of interest: S.L. Barratt reports personal fees for an advisory board and financial support to attend an educational conference from Boehringer Ingelheim outside the submitted work.
Conflict of interest: R. Davis is an employee of Boehringer Ingelheim. Boehringer Ingelheim did not have any involvement in the research or the preparation of this manuscript.
Conflict of interest: C. Sharp has nothing to disclose.
Conflict of interest: J.D. Pauling reports personal fees from Boehringer Ingelheim; grants, personal fees and non-financial support for attendance at educational meetings from Actelion Pharmaceuticals; and personal fees from Sojournix Pharma, all outside the submitted work.
- Received January 18, 2020.
- Accepted April 22, 2020.
- Copyright ©ERS 2020
This article is open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.