## Abstract

Identifying vulnerable groups and/or individuals’ cardiorespiratory fitness (CRF) is an important challenge for clinicians/researchers alike. To quantify CRF accurately, the assessment of several variables is now standard practice including maximal oxygen uptake (*V*ʹ_{CO2}) and ventilatory efficiency, the latter assessed using the minute ventilation/carbon dioxide production (*V*ʹ_{E}/*V*ʹ_{CO2}) slope. Recently, reference values (centiles) for *V*ʹ_{E}/*V*ʹ_{CO2} slopes for males and females aged 20 to 80 have been published, using cardiopulmonary exercise testing (CPX) data (treadmill protocol) from the Fitness Registry and the Importance of Exercise National Database (FRIEND Registry).

In the current observational study we provide centile curves for the FRIEND Registry *V*ʹ_{E}/*V*ʹ_{CO2} slopes, fitted using the generalised additive model for location, scale and shape (GAMLSS), to provide individuals with a more precise estimate of where their *V*ʹ_{E}/*V*ʹ_{CO2} slopes fall within the population. We also confirm that by adopting allometric models (incorporating a log transformation), the resulting ANCOVAs provided more normal and homoscedastic residuals, with superior goodness-of-fit using the Akaike information criterion (AIC)=14 671 (compared with traditional ANCOVA's AIC=15 008) that confirms allometric models are vastly superior to traditional ANCOVA models.

In conclusion, providing sex-by-age centile curves rather than referring to reference tables for ventilatory efficiency (*V*ʹ_{E}/*V*ʹ_{CO2} slopes) will provide more accurate estimates of where an individual's particular *V*ʹ_{E}/*V*ʹ_{CO2} slope falls within the population. Also, by adopting allometric models researchers are more likely to identify real and valid inferences when analysing population/group differences in *V*ʹ_{E}/*V*ʹ_{CO2} slopes.

## Abstract

**This article provides centile curves of Vʹ_{E}/Vʹ**

_{CO2}

**slopes and demonstrates that by adopting log-linear models, more trustworthy inferences with group differences will also be found**https://bit.ly/3uitACS

## Introduction

Numerous studies and subsequent position statements have confirmed the importance of cardiopulmonary exercise testing (CPX) as the gold-standard approach for the assessment of cardiorespiratory fitness [1–3]. The term cardiorespiratory fitness (CRF) refers to the ability of the circulatory and respiratory systems to supply oxygen to skeletal muscles during sustained physical activity. The clinical and research focus of CPX is usually focused on the measurement/quantification of peak or maximal oxygen uptake (*V*ʹ_{O2}). Accordingly, considerable effort has been made to establish reference values for peak/maximal *V*ʹ_{O2} across the lifespan in both males and females [4–6].

Although peak/maximal *V*ʹ_{O2} is the gold-standard measurement for exercise capacity specifically, it does not capture all of the characteristics associated with CRF, or all of the CPX responses that are useful in stratifying risk [3]. To quantify CPX responses more comprehensively, the assessment of other parallel variables is necessary. One such response is ventilatory efficiency, most commonly assessed as the minute ventilation/carbon dioxide production (*V*ʹ_{E}/*V*ʹ_{CO2}) slope, a measure that dramatically improves diagnostic and prognostic resolution as well as more accurately quantifies the degree of therapeutic efficacy for surgical, pharmacological and lifestyle interventions [1, 3].

In their recent study, Arena *et al*. [7] provide important reference values as discrete point estimates for *V*ʹ_{E}/*V*ʹ_{CO2} slopes across the lifespan in both males and females (reporting mean values at 10 per cent intervals per decile). Although a helpful guide, these discrete point estimates require a certain degree of “linear interpolation” to estimate precisely where an individual's *V*ʹ_{E}/*V*ʹ_{CO2} slope falls within the table (*e.g.*, somewhere between the 80th and 90th centile for a 68-year-old). An alternative **continuous** “growth” curve-fitting methodology for centile reference values (fitted using the Lamda-mu-sigma (LMS) proposed by Cole [8]) will provide a more accurate estimate.

Comparing population differences in *V*ʹ_{E}/*V*ʹ_{CO2} slopes is usually performed using ANOVA or ANCOVA. These analyses assume these data are normally distributed and homoscedastic. As far as the ratio variables such as peak/maximal *V*ʹ_{O2} (mL·kg^{−1}·min^{−1}) and *V*ʹ_{E}/*V*ʹ_{CO2} slopes are concerned, these assumptions are unlikely to be satisfied [6] with evidence of the heteroscedastic spread of the *V*ʹ_{E}/*V*ʹ_{CO2} slope data with age in the centile plots shown in figure 1a and b. Ratio data such as peak/maximal *V*ʹ_{O2} (mL·kg^{−1}·min^{−1}) and the *V*ʹ_{E}/*V*ʹ_{CO2} slope will invariably benefit from the log transformation [9]. Hence, the purpose of the current study is as follows: 1) to provide “individual” comparisons using the centile curves for *V*ʹ_{E}/*V*ʹ_{CO2} slope values rather than point estimates (to help overcome the linear interpolation issues described above), curves estimated using the generalised additive model for location, scale and shape (GAMLSS), fitted using the GAMLSS package [10] in R [11]; and 2) to compare population differences in the *V*ʹ_{E}/*V*ʹ_{CO2} slope using a more valid analysis (ANOVA/ANCOVA) with the help of an allometric model (incorporating a log transformation) that are likely to provide more normally and homoscedastic residuals, thus making any inference concerning population differences more trustworthy.

## Methods

### The FRIEND registry

The FRIEND Registry was established with the primary charge of establishing reference CRF values across the adult lifespan [4, 12]. The CPX laboratories are currently contributing data to the FRIEND Registry function consistent with recommendations provided in recently published guidelines including the use of valid and reliable calibration and testing procedures as well as employing experienced personnel qualified to conduct exercise tests [13, 14]. Participating CPX laboratories were responsible for obtaining local institutional review board approval, or exemption from review, for inclusion in the FRIEND Registry, providing documentation that they were authorised to submit de-identified, coded data to the core CPX laboratory housed at the University of Illinois at Chicago (UIC). Institutional review board approval for the core CPX laboratory was also obtained at UIC.

### Cohort characteristics, inclusion/exclusion criteria and study data points

The current analysis included 2511 tests from the 10 CPX laboratories listed in table 1. All CPX tests undertaken by the laboratories adopted a treadmill protocol. The mean±sd time to completion of the exercise tests was 11.23±3.06 min.

Inclusion criteria for the current analysis included CPX data on males and females: 1) aged >20 years; and 2) with a maximal CPX performed on a treadmill following established guidelines [1, 2]. Note that only treadmill protocols were performed in all 10 laboratories. Any subject identified as having a pre-existing medical condition was excluded. Thus, the cohort included apparently healthy subjects (*i.e.* subjects without a pre-existing medical condition) who achieved maximal exertion by established peak respiratory exchange ratio (RER) criteria [1]. Descriptive data for all subjects by sex and age groups are provided in table 2.

*V*ʹ_{O2}, peak RER, peak heart rate (HR) and the *V*ʹ_{E}/*V*ʹ_{CO2} slope were the CPX variables reported in the current study. Each participating CPX laboratory input VE and VCO_{2} data from the initiation of exercise to peak and calculated the *V*ʹ_{E}/*V*ʹ_{CO2} slope *via* least squares linear regression (y=mx+b, m=slope).

### Statistical methods (to assess individual differences using centile curves)

The centile curves for the *V*ʹ_{E}/*V*ʹ_{CO2} slope by age were produced using the lamda-mu-sigma (LMS) method. This method assumes that using a power transformation, data can be appropriately normalised, by stretching one tail of the distribution and shrinking the other and so removing skewness [8]. The power transformation applied in the present models used the Box–Cox t (BCT; µ, σ, ν, τ). This distribution is defined by Y^{v} having a shifted truncated t distribution with τ degrees of freedom, µ is the median of the distribution, σ⋅(τ/(τ−2))^{0.5} is approximation of the coefficient of variation, ν controls the skewness and τ the kurtosis of the distribution [15]. The generalised additive model for location, scale and shape (GAMLSS) allows each of the parameters of the distribution to be modelled as linear and/or nonlinear parametric, and/or smooth non-parametric functions of explanatory variables. A Fisher scoring algorithm was used to fit the model by maximising a penalised likelihood [10]. The analysis and centile curves were fitted and plotted using the GAMLSS package [16] in R [11]. Post-estimation diagnostics for these models included not only standard QQ-plots but also de-trended normal QQ-plots (worm plots) to check the age-conditional normality of the transformed data [17].

### Statistical methods (to assess population differences)

The additive, linear model for *V*ʹ_{E}/*V*ʹ_{CO2} slope proposed by [7] is given by
1where ε is an additive error term that is assumed to be both normally distributed and homoscedastic (remains constant throughout the range of observations). The intercept “a” was allowed to vary with sex.

An alternative multiplicative model for the *V*ʹ_{E}/*V*ʹ_{CO2} slope with allometric body size components originally proposed for *V*ʹ_{O2}_{max} by Nevill and Holder [18], and subsequently adapted by Nevill and Cooke [19] to include an age^{2} term, is given by
2where M=mass, H=height and “ε” is a multiplicative, error ratio that assumes the error will be in proportion to *V*ʹ_{E}/*V*ʹ_{CO2} slope.

The model (Eq. 2) can be linearised with a log transformation (using Ln=log_{e}). A linear regression or ANCOVA analysis on Ln(*V*ʹ_{E}/*V*ʹ_{CO2} slope) can then be used to estimate the unknown parameters in the log-transformed model, *i.e.* the transformed model (Eq. 3) is now additive that conforms with the assumptions associated with ordinary least squares and ANOVA:
3where the residual errors Ln(ε) are assumed to be normally distributed and homoscedastic. For researchers unfamiliar with analysing these allometric models (Eq. 3), the dependent variable is defined as the log-transformed *V*ʹ_{E}/*V*ʹ_{CO2} slopes, Ln(*V*ʹ_{E}/*V*ʹ_{CO2} slope), and Ln(M), Ln(H), age and age^{2} are incorporated as covariates. Population differences in *V*ʹ_{E}/*V*ʹ_{CO2} slope between groups, for example sex, can be assessed by allowing the intercept “a” to vary for each fixed factor in the ANOVA/ANCOVA analyses.

Normality was assessed using the Shapiro–Wilk and Kolmogorov–Smirnov tests. The Shapiro–Wilk statistic measures how well the data follow a normal distribution by correlating the association between the measured data and the calculated normal scores. If the correlation coefficient is near 1, the population is likely to be normal. Larger values for the Kolmogorov–Smirnov statistic (KS) indicate that the data do not follow the normal distribution.

Model comparison (goodness-of-fit) between the linear and allometric models was assessed using the Akaike information criterion (AIC). Traditionally, R^{2} is frequently used to measure goodness-of-fit. However, higher R^{2} values do not always indicate a better fit. Higher R^{2} can indicate overfitting, and adding noise variables will also inflate R^{2}. So R^{2}, while useful, is not necessarily the best method of comparing competing models. An alternative method of model comparison is to use the AIC. AIC can be conceptualised as a “distance” or error between the data and a model, lower values meaning a better model. Unlike R^{2}, which rewards added noise variables with a higher (better) value, adding noise variables will increase AIC which penalises its values (lower being better). A difference between two AIC values <2 is really irrelevant; differences >2 ≤6, evidence that the lower model is better/superior is somewhat positive; >6 ≤10, the evidence that the model with the lower AIC value is better/superior is strong; >10, the evidence that the model with the lower AIC is better/superior is very strong. Given differences between AIC values can be challenging to comprehend, beyond appreciating that one model is better than another, evidence ratios (see Nevill and Cooke [19]) will also be calculated to present the magnitude of the difference in terms of how many times better the best model is in terms of its predictive accuracy [20–22].

## Results

### Individual comparisons

The centile curves for the *V*ʹ_{E}/*V*ʹ_{CO2} slope by age are given for males and females separately in figure 1a and b, and for comparative purposes, the 50th percentile values for the male and female subjects by age in figure 1c.

These curves enable the reader to estimate an individual's *V*ʹ_{E}/*V*ʹ_{CO2} slope value within a population of healthy individuals (N=2511) for comparative purposes. Point-estimate centile tables by age for male and female individuals are also given in table 3.

### Population comparisons

#### Linear, additive models

Comparing population differences in *V*ʹ_{E}/*V*ʹ_{CO2} is usually performed using ANOVA or ANCOVA. To assess differences in sex using Eq. 1, we obtain
where the variable sex is incorporated as a (0, 1) indicator variable, given by male=0 and female=1. However, the total R value for this equation was 0.33 with an adjusted R-square of 0.10 and a standard error of the estimate (SEE) of 4.8. Parameter estimates and 95% confidence intervals are given in table 4.

When the residuals were plotted against the “fits” or predicted (*V*ʹ_{E}/*V*ʹ_{CO2} slope) values, the plot (figure 2a) illustrates a lack-of-fit (the “fits” quadratic term was B=0.26, t=8.14; p<0.001) and heteroscedasticity (confirmed when the absolute residuals were correlated against the predicted values (n=2511), r=0.089, p<0.001).

The Shapiro–Wilk statistic was r=0.967 and the Kolmogorov–Smirnov statistic KS=0.05, indicating that the residuals were not normally distributed (p<0.001).

#### Multiplicative allometric model

Fitting the multiplicative allometric model to the FRIEND's data using Eq. 3, we obtained the fitted parameters given in table 5.

In order to compare the quality of fit (AIC) values using the same parameters as linear model Eq. 1, but adopting the allometric model structure, we obtained AIC=14 671.04. Using the allometric model with age (but not age^{2}), height, weight and gender as predictors, AIC=14 672.33. Note that the AIC for the linear model Eq. 1 was 15 007.99 (table 4), a difference in AIC=336.95. The evidence ratio for this difference in AIC values indicates the allometric model is 3.921438×10^{71} times better in terms of model parsimony than the additive linear model. Parsimony is important because it helps discriminate signal from the noise, facilitating generalisation and increasing predictive ability for new data.

When the residuals were plotted against the “fits” or predicted Ln(*V*ʹ_{E}/*V*ʹ_{CO2} slope) values, the plot (figure 2b) illustrates acceptable fit (the “fits” quadratic term was B=−0.9, t=1.72; p=0.086) and homoscedasticity (confirmed when the absolute residuals were correlated against the predicted values (N=2511), r=0.000, p=0.998).

The Shapiro–Wilk statistic was r=0.992 (compared with r=0.967 using linear Eq. 1) and the Kolmogorov–Smirnov statistic KS=0.031 (compared with KS=0.05 using Eq. 1), indicating that the residuals were still not normally distributed (p<0.001) but behave considerably better (more normal) than the residuals obtained from the linear additive model (Eq. 1).

To illustrate the possible inferential pitfalls that might occur when investigating population differences using linear models, we re-examined the *V*ʹ_{E}/*V*ʹ_{CO2} slopes from the FRIEND dataset using allometric models but incorporating a sex-by-age interaction term as well as a sex main effect and allowing subjects to be nested within sites. The re-analysis revealed a significant age-by-sex interaction (p<0.001) that can be clearly seen in figure 1c, a finding that appears to have been overlooked by Arena *et al*. [7] using the linear additive model (Eq. 1).

## Discussion

The first aim/purpose of the current study was to provide centile “growth” curves (by age) for ventilatory efficiency as recorded by the *V*ʹ_{E}/*V*ʹ_{CO2} slope values. Arena *et al*. [7] provided discrete point-estimate centile reference values for the *V*ʹ_{E}/*V*ʹ_{CO2} slope by age (using deciles) and sex, when all treadmill-testing exercise data were used to calculate the slopes. In the current study we report **continuous** “growth” curves for the centile reference values (fitted using the lamda-mu-sigma (LMS) method proposed by Cole [8]) that provides the reader with the opportunity to more accurately **interpolate** an individual's reference values between the point estimates (deciles) reported by Arena *et al*. [7]. In addition to this inherent error (or lack of precision) associated from using point estimates provided by reference tables, the reference tables would only allow a CPX administrator to say how close to “normal” the patient's *V*ʹ_{E}/*V*ʹ_{CO2} slope value was. In comparison, the centile curves provide a valuable added context associated with the additional level of precision. The interpretation of centiles is straightforward. For example, in the case of an individual's *V*ʹ_{E}/*V*ʹ_{CO2} slope and age, if their estimate is on the 25th centile, it means that for every 100 individuals of that age, 25 would have a lower *V*ʹ_{E}/*V*ʹ_{CO2} slope and 75 a higher *V*ʹ_{E}/*V*ʹ_{CO2} slope. These results may prove useful in enhancing the interpretation of CPX results. We recognise that similar centile curves have been published elsewhere recently by Wagner *et al*. [23] but using a different CPX protocol (bicycle ergometry) and with a reduced sample size [24].

Comparing the centile curves reported by Wagner *et al.* [23] using bicycle ergometry with those reported in the current study using a treadmill protocol, we detected large differences. The 50th centiles for the male *V*ʹ_{E}/*V*ʹ_{CO2} slopes reported by Wagner *et al*. [23] ranged from 34.2 to 37.2 (for 30- to 60-year-old males, respectively). In contrast, 50th centiles for the male *V*ʹ_{E}/*V*ʹ_{CO2} slopes reported in the current study (treadmill protocol) ranged from 26.31 to 29.46 for 30- to 60-year-old males, respectively. A similar contrast is observed with female *V*ʹ_{E}/*V*ʹ_{CO2} slopes. The 50th centiles for the female *V*ʹ_{E}/*V*ʹ_{CO2} slope scores reported by Wagner *et al*. [23] ranged from 34.0 to 36.5 (30 - to 60 -year-old females, respectively). The 50th centiles for the female *V*ʹ_{E}/*V*ʹ_{CO2} slope scores reported in the current study (treadmill) ranged from 27.76 to 29.97 for 30- to 60-year-old females, respectively. These differences suggest that the protocol used to obtain the *V*ʹ_{E}/*V*ʹ_{CO2} slopes is crucial and outlines the necessity to refer to the appropriate centile curves depending on the CPX protocol used.

The second aim/purpose of the current study was to recommend how researchers should compare population differences in the *V*ʹ_{E}/*V*ʹ_{CO2} slopes. The majority of researchers exploring population differences in *V*ʹ_{E}/*V*ʹ_{CO2} slopes have used traditional ANOVA, ANCOVA or regression methods (such as Eq. 1) to explore differences between categorical variables such as age, sex, smoking habits, BMI categories and exercise habits [5, 7, 25, 26]. However, the range of ratio variables (such as *V*ʹ_{O2}_{max} (mL·kg^{−1}·min^{−1}) and the *V*ʹ_{E}/*V*ʹ_{CO2} slope) that are also associated with body size are invariably bounded by zero to the left (cannot be negative) but unbounded to the right. This leads its frequency distribution to be positively skewed and not normally distributed.

The solution to overcome such problems would appear to adopt an alternative multiplicative model with allometric body size components (Eq. 2). The model can be linearised with a log transformation (using Ln=log_{e}) and linear regression or ANCOVA analysis on Ln(*V*ʹ_{E}/*V*ʹ_{CO2} slope) can then be used to estimate the unknown parameters (see Eq. 3). The log transformation will naturally overcome both the positive skewness and heteroscedasticity, the latter seen clearly in the centile curves figure 1a and b, where the more extreme curves (*e.g.* 90% and 10% centile) diverge with increasing age in both the male and female figures.

The allometric model (Eq. 2) performs better than the linear model (Eq. 1) based on ALL model comparison criteria. 1) The explained variance was greater using the allometric model (R^{2}=0.151) compared with the linear models (R^{2}=0.105). 2) Both the Shapiro–Wilk statistic and Kolmogorov–Smirnov tests indicate that the saved residuals from the linear and allometric models were not normally distributed. However, the residuals from the allometric model were closer to a normal distribution than the residuals saved from the linear model, with the Q-Q plot correlations for the linear and allometric models being 0.967 and 0.992, respectively. 3) The residuals *versus* the predicted values (fits) plots provided evidence of curvature (lack-of-fit) as seen with the linear model in figure 2a. No such evidence was apparent with the allometric model in figure 2b. 4) The goodness-of-fit assessed using the AIC was much lower (AIC=14 679.3) for the allometric model compared with the linear additive model (AIC=15 008.0). The 328.7 difference between models points to a major improvement, some 71 orders of magnitude greater in terms of parsimony and therefore predictive accuracy.

Clearly, these results suggest that allometric models (incorporating a log transformation) are likely to provide a better fit to the *V*ʹ_{E}/*V*ʹ_{CO2} slope values with more normally and homoscedastic residuals, thus making any inference concerning population differences more valid and trustworthy. This was reinforced when the allometric model identified the sex-by-age interaction (see figure 1c) that appears to have been missed using linear additive models (Arena *et al*. 2020 [7]). Some authors [25] have also reported an age-by-sex interaction (in their table 4), but as yet no mechanism has been provided to explain why ventilatory efficiency (*V*ʹ_{E}/*V*ʹ_{CO2} slope) is higher in females than males in younger age groups but converges (little or no difference) in the older age groups. This can be considered a limitation of the study.

In summary, providing centile curves (by age and sex) rather than having to refer to mean reference tables for ventilatory efficiency (*V*ʹ_{E}/*V*ʹ_{CO2} slopes) should provide individuals with a more accurate estimate of where their particular *V*ʹ_{E}/*V*ʹ_{CO2} slope falls within the general population. Also, by adopting the allometric model (Eq. 2), fitted using the log-transformed model/equation (Eq. 3), researchers are more likely to identify real and/or valid inferences, leading to more trustworthy interpretations of population differences (and interactions) when analysing *V*ʹ_{E}/*V*ʹ_{CO2} slopes.

## Footnotes

Data availability: The data are available from R. Arena, Dept of Physical Therapy, College of Applied Sciences, University of Illinois at Chicago, Chicago, IL, USA.

Conflict of interest: A.M. Nevill has nothing to disclose.

Conflict of interest: J. Myers has nothing to disclose.

Conflict of interest: L.A. Kaminsky has nothing to disclose.

Conflict of interest: R. Arena has nothing to disclose.

Conflict of interest: T.D. Myers has nothing to disclose.

- Received February 10, 2021.
- Accepted May 20, 2021.

- Copyright ©The authors 2021

This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org