Abstract
Background The 2022 ESC/ERS guidelines on pulmonary hypertension recommend noninvasive risk assessments based on three clinical variables during follow-up in patients with pulmonary arterial hypertension (PAH). We set out to test whether residual risk can be captured from routinely measured noninvasive clinical variables during follow-up in PAH.
Methods We retrospectively studied 298 incident PAH patients from a German pulmonary hypertension centre who underwent routine noninvasive follow-up assessments including exercise testing, echocardiography, electrocardiography, pulmonary function testing and biochemistry. To select variables, we used least absolute shrinkage and selection operator (LASSO)-regularised Cox regression models. Outcome was defined as mortality or lung transplant after first follow-up assessment.
Results 12 noninvasive variables that were associated with outcomes in a training sub-cohort (n=208) after correction for multiple testing entered LASSO modelling. A model combining seven variables discriminated 1-year (area under the curve (AUC) 0.83, 95% confidence interval (CI) 0.68–0.99, p=8.4×10−6) and 3-year (AUC 0.81, 95% CI 0.70–0.92, p=2.9×10−8) outcome status in a replication sub-cohort (n=90). The model's discriminatory ability was comparable to that of the guideline approach in the replication sub-cohort. From the individual model components, World Health Organization functional class, 6-min walking distance and the tricuspid annular plane systolic excursion to systolic pulmonary arterial pressure (TAPSE/sPAP) ratio were sensitive to treatment initiation. Addition of TAPSE/sPAP ratio to the guideline approach numerically increased its ability to discriminate outcome status.
Conclusion Our real-world data suggest that residual risk can be captured by noninvasive clinical procedures during routine follow-up assessments in patients with PAH and highlights the potential use of echocardiographic imaging to refine risk assessment.
Abstract
A systematic approach using a supervised feature selection algorithm indicates that residual mortality risk can be captured from noninvasive assessments during routine follow-up in patients with pulmonary arterial hypertension https://bit.ly/3ZFfoDR
Introduction
Pulmonary arterial hypertension (PAH) is a rare pulmonary vasculopathy defined by increased pulmonary arterial pressure and vascular resistance [1]. The complex pathology involves progressive and obliterating remodelling of small pulmonary arteries [2]. The most important consequence of PAH is right-sided heart failure through chronically increased afterload to the right ventricle (RV), which causes a complex clinical syndrome affecting multiple organ systems through low cardiac output and systemic venous congestion, and results in high mortality if left untreated [3–5]. Clinical markers that relate not only to pulmonary vascular disease but also to the systemic consequences of PAH were able to predict disease severity and outcomes, and to inform treatment strategies with targeted drugs [1, 6–8].
Multi-variable risk models combining variables from noninvasive and mixed invasive/noninvasive clinical procedures can be used to stratify patients into risk groups for short-term mortality before and during treatment [6–10]. The 2015 and 2022 European Society of Cardiology (ESC)/European Respiratory Society (ERS) guidelines on pulmonary hypertension (PH) recommend risk assessment at diagnosis of PAH using a three-strata model that incorporates up to 18 different variables from clinical observations and mixed invasive/noninvasive procedures to classify patients at low, intermediate and high 1-year mortality risk [1, 11]. A limitation of this three-strata risk-assessment model is that up to 70% of the patients are classified at intermediate risk [1, 8, 10]. A four-strata model based on an abbreviated list of noninvasive clinical measures (Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) 2.0) was subsequently developed that provided better discrimination within the intermediate-risk group [12, 13]. This model was implemented in the recent version of the 2022 ESC/ERS guidelines and recommended to be used during follow-up visits in patients with PAH [1, 12, 13]. The model included two markers of exercise capacity (World Health Organisation functional class (WHO-FC) and 6-min walking distance (6MWD)) and one biochemical marker relating to myocardial stress (brain natriuretic peptide/N-terminal pro-brain-type natriuretic peptide (BNP/NT-proBNP)). Models based on these three measures, however, achieved only fair to good discriminatory accuracy to distinguish survivors from non-survivors (C-indices <0.80) [6–8, 10, 12–16].
With increasing therapeutic options that are becoming available to patients with PAH, clinical decisions should be guided by the best possible estimates for future outcome events and by risk assessments with sufficient granularity. This includes decisions on far-reaching therapeutic options such as the need for parenteral prostanoid therapy and evaluation for lung transplantation. Particularly, active listing for lung transplantation requires precise estimates for short-term mortality with and without transplantation [17]. A complete knowledge of all possible contributors to mortality risk including cut-off values holds the promise to enhance the predictive ability of contemporary risk stratification tools. Candidate-driven approaches have shown that the use of additional variables derived from echocardiography, cardiopulmonary exercise testing or gas exchange improved risk prediction [9, 18–20]. In addition, advances in machine learning methods might provide a rigorous framework to further refine risk models [21].
We set out to systematically assess the ability of clinical measures obtained from noninvasive procedures in routine follow-up visits to predict outcomes in patients with PAH. Using a supervised feature selection algorithm, we fitted a combinatory model that best predicted 1-year prognosis. Sensitivity to aetiology of PAH and to changes from baseline to follow-up were evaluated, and each model component was assessed individually.
Methods
Selection of patients
We retrospectively accessed data from all patients with newly diagnosed idiopathic, heritable, drug- or toxin-induced and associated PAH (connective tissue disease (CTD), infection with human immunodeficiency viruses (HIV), porto-pulmonary hypertension, congenital heart disease (CHD)) attending the University Medical Centre Hamburg-Eppendorf, Hamburg, Germany, between 1 January 2009 and 31 December 2019. PAH was diagnosed based on right heart catheterisation with a mean pulmonary arterial pressure (mPAP) ≥25 mmHg and pulmonary arterial wedge pressure (PAWP) ≤15 mmHg following the exclusion of other causes of precapillary PH. Diagnosis and classification of disease were consensus-based in an interdisciplinary team. Patients were commenced on an (at the time) approved PAH-targeted drug(s), and follow-up visits were scheduled every 3–6 months (or whenever clinically indicated). The combined primary end-point was all-cause mortality or lung transplant. Follow-up was censored on 31 December 2021. Data collection was approved by the local ethics committee and complied with the Declaration of Helsinki.
Data collection
Clinical data were collected at diagnosis (within 3 months from diagnostic right heart catheterisation) and at the first comprehensive follow-up visit. Age at diagnosis, self-reported sex, height, weight, comorbidities and WHO-FC were recorded along with data from six clinical procedures (right heart catheterisation, echocardiography, electrocardiogram, pulmonary function test, 6-min walking test and routine biochemistry). A detailed list of all pre-defined noninvasive data points, rate of missing data and formulas for secondary data are provided in supplementary table S1. On average, 7.9% of data were missing at follow-up assessments. Data points from available clinical procedures were curated by two independent attending physicians and excluded if deemed implausible by consensus. If biochemical measurements were below the detection limit, the lowest measurable value of the assay was used for analysis.
Statistical analyses
In a landmark study approach, Cox regression analyses were performed using time in years from date of follow-up assessment until occurrence of the primary end-point or census. To select a combination of clinical variables that best predicted outcomes, a least absolute shrinkage and selection operator (LASSO)-regularised Cox regression model was applied as implemented in the R-package glmnet (version 4.1.4). The regularisation parameter lambda was chosen from a grid of values through 10-fold cross-validation. To minimise overfitting, lambda was chosen by adding one standard error. Calibration of the model was assessed visually using calibration plots and by the Hosmer–Lemeshow goodness-of-fit test with p>0.05 indicating no significant deviation from perfect fit. Comparisons of area under the curves (AUCs) from receiver operating characteristic (ROC) were performed by the DeLong's test utilising the R-package pROC (version 1.18.0). To improve comparability with the four-strata COMPERA 2.0 risk score that is recommended by the 2022 ESC/ERS guidelines on PH [1], we divided the newly derived model into quartiles based on all available data (i.e., from baseline and follow-up assessments). We computed categorical net reclassification indices (NRI) indicating classification of patients into risk strata from different models as implemented in the R-package PredictABEL (version 1.2.4). The maximal Youden index for 1-year survival status from follow-up visits was used to compute optimal cut-offs. Kaplan–Meier survival curves with log-rank tests were performed by the R-package survival (version 3.4.0). Group comparisons were performed by chi-squared tests or Wilcoxon signed-rank tests as appropriate. Patients were randomly assigned to training or replication sub-cohorts using an in-built R-function (“sample”).
Missing data were imputed using the predictive mean modelling method as implemented in the R-package MICE (version 3.14.0). To increase the reliability, 25 imputed datasets were created (30 iterations each). Data were imputed in the range between minimal and maximal measured values. We included all clinical data but not the outcome variable in the imputation model. Results were averaged over all imputed datasets.
Data are presented as absolute numbers and percentages, or median and interquartile range (IQR). Results from regression analyses or ROC are presented along with 95% confidence intervals (CI). Correction for multiple testing was applied using the conservative Bonferroni method (q-values).
All statistical analyses were performed with R (version 4.2.1, R Foundation for Statistical Computing, Vienna, Austria).
Results
Cohort description
We included 298 patients with PAH (figure 1). Median age at diagnosis was 64 years (IQR 48–74), 61% were female and more than half of the patients were classified as idiopathic PAH (52%). Upfront combination therapy with targeted drugs was initiated in half of the patients (dual in 42% and triple in 8%), while the other half received initial monotherapy. Median time between treatment initiation and first comprehensive follow-up visit was 4.9 months (IQR 3.4–7.5), and during a median observational period of 3.5 years (IQR 1.7–5.9) from follow-up visit 119 composite outcome events occurred (114 deaths and five lung transplants). The rate of transplant-free survivors was 90% after 1 year, 72% after 3 years and 63% after 5 years from diagnosis and 86%, 71% and 62% from follow-up, respectively. The basic characteristics of patients are provided in table 1.
For subsequent analysis, we randomly separated our cohort in a 2:1 ratio into training (n=208) and replication (n=90) sub-cohorts. Both sub-cohorts showed comparable basic characteristics (table 1).
Selection of outcome predictors from noninvasive follow-up assessment data
Using the training sub-cohort, we identified that 12 of 69 noninvasive clinical measures obtained at follow-up assessments were associated with outcomes in age- and sex-adjusted regular Cox regression models (Bonferroni-corrected q<0.05; table 2). All 12 entered LASSO-regularised modelling in the training sub-cohort, and seven variables were selected to create a single combinatory model (supplementary figure S1); namely, 6MWD, WHO-FC, tricuspid annular plane systolic excursion to systolic pulmonary arterial pressure ratio (TAPSE/sPAP), right atrial area index by the body surface area (RAAi), diffusing capacity of the lung for carbon monoxide (DLCO), total lung capacity (TLC) and aspartate transaminase to alanine transaminase ratio (AST/ALT; table 3).
Internal replication and comparison with established risk models
The model based on seven noninvasive variables was calculated from follow-up assessment data in the internal replication sub-cohort and accurately discriminated 1-year and 3-year transplant-free survivors from non-survivors after the follow-up visit with AUCs of 0.83 (95% CI 0.68–0.99, p=8.4×10−6) and 0.81 (95% CI 0.70–0.92, p=2.9×10−8), respectively (figure 2). The predicted and observed transplant-free survival rates matched closely indicating acceptable model calibration (supplementary figure S2), and the Hosmer–Lemeshow test indicated a good model fit for 1-year (p=0.891) and 3-year (p=0.327) outcomes.
The four-strata COMPERA 2.0 risk score also discriminated transplant-free survivors from non-survivors 1 and 3 years after follow-up in the replication sub-cohort (figure 2). Comparing the AUCs favoured the newly derived model for 1-year outcome status, both as continuous and as the four-strata version, but no significant difference occurred for 3-year outcome status (figure 2). In patients with an outcome event occurring within 1 year from follow-up, 33% (4 out of 12) were reclassified to a higher risk group based on the newly derived model; while 35% (25 out of 72) of patients without an outcome event within 1 year from follow-up entered a lower risk group (NRI for 1-year outcome status in the replication sub-cohort: 0.53, 95% CI 0.22–0.83, p<0.001; supplementary table S2).
The noninvasive Registry to Evaluate Early And Long-term PAH Disease Management (REVEAL) 2.0 lite risk score also discriminated transplant-free survivors from non-survivors 1 or 3 years after follow-up in the replication sub-cohort (AUC 0.78 and 0.77). Comparisons of AUCs numerically favoured the newly derived model, albeit this was not statistically significant (DeLong's test p=0.151 for AUC comparisons of 1-year outcome status and p=0.296 for 3-year outcome status; supplementary figure S3).
Model sensitivity to aetiological subgroups of PAH
To assess the sensitivity of the newly derived model to aetiologies of PAH, we excluded patients with associated forms of PAH (APAH) using the entire cohort. The AUCs to discriminate transplant-free survivors from non-survivors 1 year from follow-up were 0.86 (95% CI 0.79–0.92, p=4.03×10−25, n=268) following exclusion of patients with CHD-APAH and 0.90 (95% CI 0.84–0.96, p=2×10−40, n=174) following exclusion of all patients with all forms of APAH. Similar results were obtained for the transplant-free survival status 3 years from follow-up with AUCs of 0.83 (95% CI 0.77–0.88, p=3.5×10−31) and 0.83 (95% CI 0.76–0.90, p=1.2×10−19), respectively. The four-strata version of the model based on quartiles likewise stratified outcomes from follow-up visits in Kaplan–Meier curves following the exclusion of patients with CHD-APAH and all APAH (figure 3).
Model and individual component sensitivity to changes from baseline to follow-up assessment
We assessed changes of the newly derived model divided into four strata based on overall quartiles between baseline and follow-up assessments. After treatment initiation, 49% of patients entered a different stratum of the model at follow-up (35% down and 14% up; figure 4). In Kaplan–Meier survival analyses, these changes of risk strata translated into a tendency for outcome probability with better transplant-free survival for patients improving their risk stratum compared to those who entered a lower risk stratum (log-rank test on “improved” versus “worsened” p=0.081; supplementary figure S4).
Assessing each model component individually showed that out of the seven components, 6MWD, WHO-FC and TAPSE/sPAP significantly changed from baseline to follow-up (all p<0.05 for paired group comparisons; supplementary figure S5). 6MWD and WHO-FC, but not TAPSE/sPAP, were components of the COMPERA 2.0 score that is recommended by current European guidelines during follow-up in PAH [1, 12, 13]. Using an optimal cut-off value from Youden's statistics (0.23 mm·mmHg−1; supplementary table 3), changes from baseline to follow-up in TAPSE/sPAP significantly translated into long-term outcomes in PAH (p=0.003 for log-rank test on “improved” versus “worsened”; figure 5).
Adding TAPSE/sPAP to the four-strata COMPERA 2.0 risk score
All three components of the four-strata COMPERA 2.0 risk score including NT-proBNP were prognostic in our cohort (supplementary table S4). We next added TAPSE/sPAP to the COMPERA 2.0 risk score using cut-offs from the 2022 ESC/ERS guideline and that identified in our data: >0.32 (low risk), 0.24–0.32 (intermediate–low risk), 0.19–0.23 (intermediate–high risk) and <0.19 (high risk; supplementary table S5). In the total cohort, the addition of TAPSE/sPAP numerically increased the model's discriminatory ability for 1-year (AUC from 0.80 to 0.81) and 3-year outcome status (AUC from 0.78 to 0.81), but this was not statistically significant (DeLong's test p=0.692 and p=0.110).
Discussion
Precise risk assessment is key to clinical management of patients with PAH, and sequential assessments based on noninvasive procedures promise to inform treatment decisions during the disease. We have used a feature selection algorithm to combine noninvasive variables that best predicted future outcome events from follow-up visits in patients with PAH. A model combining seven variables well-stratified short-term mortality risk. Our systematic analyses indicate that residual risk can potentially be captured from noninvasive procedures during routine early follow-up visits in PAH and emphasises the need for extended data acquisition in large-scale registry studies.
Characteristics of our cohort including comorbidity seemed in general comparable to the two recent key studies on prognostic risk models from the two largest European registries on PAH, COMPERA and the French registry [12, 13]. Half of the patients received initial combination therapy, which was in line with recent data from the COMPERA registry, where the use of combination therapy increased from 2010 to 2019 to 46% during the first year of treatment [22]. COMPERA and our cohort contained predominantly cases with idiopathic disease [12]. In sensitivity analysis, we could not detect a reduction of the model's discriminatory ability following exclusion of associated forms of PAH. While numbers of patients in registries exceed vastly that from a single-centre study, single-centre data, through phenotypic depth, could pioneer variable selection.
The newly derived model achieved an AUC of 0.83 to discriminate 1-year outcome status in our unseen internal replication sub-cohort. The discriminative ability was comparable to Pulmonary Hypertension Outcomes Risk Assessment (PHORA) 2.0, which is based on a Bayesian network analysis in combination with random forest classification. In aggregated data from clinical trials on PAH, the noninvasive version of PHORA combining 12 variables achieved an AUC of 0.80 [23]. Interestingly, the addition of invasive measures only showed minimal increase, which underpins the potential of accurate risk prediction based on noninvasive clinical procedures. Similarly, in data from the French registry invasive haemodynamic variables were no longer significant following the introduction of BNP/NT-proBNP in the multivariate regression model [7]. Further biological markers that capture multiple pathophysiological processes, beyond myocardial stress, could enhance the biological component in risk models. Adding a six-protein score to NT-proBNP improved prediction of 5-year outcomes from AUC 0.76 to 0.82 in the combined UK National PAH Cohort Study and French Prospective Longitudinal Study of Patients With Idiopathic Pulmonary Arterial Hypertension, Family or Taking Anorectics (EFORT) [24]. Furthermore, imaging on the right heart can potentially refine risk assessment. The addition of a structural feature of the RV as assessed by cardiac magnetic resonance (CMR) imaging (indexed RV end-systolic volume) to the REVEAL 2.0 risk score increased AUC from 0.74 to 0.78 for 1-year outcome status in a single-centre database [20]. However, advanced biological phenotyping and cardiac CMR are not readily available in all PH centres. Our approach, however, is entirely based on routine clinical variables that are time- and cost-effective.
The newly derived model combined seven noninvasive clinical variables with two overlapping with the COMPERA 2.0 score, WHO-FC and 6MWD [1], while NT-proBNP was not selected by LASSO modelling. For the reason of comparability and real-world applicability we used non-transformed values for each variable but cannot exclude that transforming the typically skewed distribution of NT-proBNP would have affected its selection. Additionally, (multi-)collinearity with other variables may have affected the selection of NT-proBNP. Next to exercise limitation as measured by WHO-FC and 6MWD, the seven variables in the LASSO model depict different pathophysiological aspects including impaired gas exchange (DLCO), restrictive ventilation pattern (TLC), impaired hepatic function (AST/ALT) and echocardiographic changes in right heart structure and function (RAAi, TAPSE/sPAP). Not all variables were sensitive to treatment initiation, when measurements were compared between baseline and follow-up, indicating possible association with non-modifiable PAH subgroups that intrinsically impact prognosis such as porto-pulmonary PH, CTD-PAH or the recently proposed lung phenotype in idiopathic PAH [25, 26]. But pulmonary vascular disease can also directly affect these variables. The liver can be affected by hepatic venous congestion and low cardiac output causing ischaemic hepatitis [5, 27]. DLCO can be moderately reduced possibly due to alveolar capillary membrane thickness and a reduction of pulmonary capillary blood volume caused by vascular remodelling [28, 29]. A mild restrictive pattern has been previously described as a common feature in PAH patients and is pronounced in patients with low DLCO [26, 30]. Nevertheless, TLC was generally within the normal range in our cohort.
Although the 2015 and 2022 ESC/ERS guidelines recommend imaging of the right heart by echocardiography for risk assessment during diagnostic workup, sufficient evidence is missing on whether imaging can improve risk prediction during follow-up [1, 11]. It was shown that the RAA strongly correlates with the body surface area [31]. The indexed RAA, but not “raw” RAA, emerged from the discovery phase of our analysis, but it needs to be further established whether an improvement over RAA can be achieved by adjustment of the body composition. The ratio TAPSE/sPAP enables noninvasive estimation of the RV to pulmonary artery (PA) coupling [32, 33]. The TAPSE/sPAP showed an AUC of 0.83 to distinguish between RV-PA coupling and uncoupling as assessed by the gold-standard, pressure–volume relationships, with an optimal cut-off of 0.31 mm·mmHg−1 [33]. We found an optimal cut-off for outcome discrimination of 0.23 mm·mmHg−1, which is at the lower end of those previously suggested (0.19–0.33 mm·mmHg−1) [1, 18, 33–36]. If TAPSE/sPAP increased during follow-up, patients showed significantly better long-term outcomes indicating its potential use to capture modifiable risk in PAH. Adding TAPSE/sPAP to the four-strata COMPERA 2.0 risk score increased numerically the AUCs for discrimination of outcome status.
Our study has limitations that should be discussed. We performed a single-centre study, sample size, geographical and ethnical diversity were limited, and we cannot exclude referral bias of more severe cases and cases with a pulmonary phenotype to our tertiary centre in respiratory medicine. Owing to the retrospective study design based on real-world patient management, there were missing data which were imputed to avoid selection bias and to enable statistical analysis, but we cannot exclude variance that was introduced by imputation. Some biochemical measures showed higher rates of missing data points but none of these were selected for model development. Review of original echocardiographic images was not possible in all cases. We only included procedures/variables that were available at our centre and documented regularly during the study period. Thus, we were not able to include some predictive variables that were previously reported such as rate of hospitalisations before diagnosis of PAH [6]. External and ideally prospective validation on prevalent patients with PAH including different subgroups would be needed for this observational study to test for generalisability.
Our systematic approach indicates that incorporation of multi-modal routinely measured noninvasive clinical variables may increase accuracy of risk models in predicting future outcome events during follow-up in patients with PAH. We demonstrate that leveraging the phenotypic depth available at a single centre in combination with an unbiased feature selection algorithm can prioritise variables and inform large-scale registries and meta-analyses. Particularly easy to use variables obtained by echocardiographic imaging such as the TAPSE/sPAP ratio emerged as being dynamic and sensitive to changes in prognosis over time.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00072-2023.SUPPLEMENT
Acknowledgement
The authors thank Anja Paulsen (University Medical Centre Hamburg-Eppendorf, Hamburg, Germany) for her work managing the data for the local PH registry. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sector.
Footnotes
Provenance: Submitted article, peer reviewed.
Author contribution: Study design, data collection, statistical analyses and manuscript drafting were performed by J. Ostermann and L. Harbaum. All authors have contributed to data curation, interpretation of analyses and finalising the manuscript. All authors have approved the current version of the manuscript prior to submission and declared that the submitted work is original and has not been published (or being under consideration of publication) elsewhere (neither in English nor in any other language).
Conflict of interest: No financial conflict of interest has arisen from any author that interferes with the study design, methods or results.
- Received February 5, 2023.
- Accepted March 28, 2023.
- Copyright ©The authors 2023
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org