## Abstract

**Background** Over the past decade, the Global Lung Function Initiative (GLI) Network has published all-age reference equations on spirometry, diffusing capacity of the lung for carbon monoxide (*D*_{LCO}) and lung volumes.

**Methods** We evaluated the appropriateness of these equations in an adult Caucasian population. Retrospective lung function data on subjects who performed tests prior to a diagnostic sleep investigation were analysed. From the medical records, lung healthy, lifetime nonsmoking, nonobese subjects were selected, resulting in a population of 1311 subjects (68% male; age range 18–88 years).

**Results** Multiple linear regression analysis revealed that lung function z-scores did not differ between subjects with and without sleep apnoea but did depend on height and age. The average forced expiratory volume in 1 s (FEV_{1})/forced vital capacity (FVC) z-score was 0 but exhibited an inverse association with height in both sexes (p<0.01). Values of FEV_{1} and FVC in both sexes were larger than predicted (mean±sd z-score +0.30±0.96 or 104±13% pred; p<0.01). Overall, static lung volumes and *D*_{LCO} were adequately predicted. However, *D*_{LCO} z-scores were inversely associated with height in males and age in females (p<0.01). For all lung function indices, the observed scatter was reduced compared with the prediction. Therefore, for all indices <5% of the data were below the GLI-proposed lower limit of normal (LLN) threshold.

**Conclusion** GLI reference equations provide an adequate fit in Belgian adults. However, the GLI-proposed LLN is too low for our Antwerp population, resulting in underdiagnosis of disease. Furthermore, airway obstruction and diffusion disorders might be misclassified due to height and age associations.

## Abstract

**Overall, GLI reference equations for lung function appropriately describe the data in Belgian adults. However, airway obstruction and diffusion disorders might be misdiagnosed at age and height extremes, and the GLI LLN was too low in this population.** https://bit.ly/3jdauLE

## Introduction

Appropriate reference values are critical for the correct interpretation of pulmonary function tests (PFTs). Until recently, the 1993 European Respiratory Society (ERS) reference values [1, 2] were often used in Europe, although their use was not recommended by the 2005 American Thoracic Society (ATS)/ERS Task Force on Standardisation of Lung Function Testing [3]. This Task Force suggested a new Europe-wide study to derive updated reference values. Instead, the Global Lung Function Initiative (GLI) collated data on normal values and successively published new prediction equations on spirometry, diffusing capacity of the lung for carbon monoxide (*D*_{LCO}) and static lung volumes over the past decade [4–6].

The GLI spirometry reference equations were introduced nationwide in Belgium in 2018 [7]. Many studies have evaluated the GLI spirometry equations and recommended their use in clinical practice [8–12]. However, not all validation studies on Caucasians have concluded that GLI provides an accurate fit [13–15]. Comparison of the GLI reference equations on spirometry to those of the 1993 ERS reference values convincingly demonstrated that the former outperformed the latter to describe the data of a large Swedish cohort [16].

The newly proposed GLI reference values on *D*_{LCO} are substantially lower than those of the 1993 ERS reference values [17, 18]. In a small sample of 150 control subjects, the GLI prediction showed better agreement with the measured data of both spirometry and diffusing capacity than the 1993 ERS did [19]. However, measured *D*_{LCO} values exceeded the GLI prediction in a large Swedish population [20]. Recently, the *D*_{LCO} equations were corrected after the discovery of an error in the GLI database [21]. To the best of our knowledge, the corrected *D*_{LCO} reference equations and the recently developed equations on lung volumes have not yet been independently validated in a large population of healthy controls.

The GLI reference equations were based on a collation of normative data provided by different centres worldwide. The origin and quantity of data differed greatly between the three different PFTs, which might affect consistency between the reference equations. In this study, we evaluate, for the first time, the appropriateness of all GLI reference equations in the same population.

## Methods

### Study population

We analysed pulmonary function data on subjects that underwent testing prior to a diagnostic sleep investigation at Antwerp University Hospital (Antwerp, Belgium) between 2009 and 2020. Never-smoking Caucasian adults with a body mass index (BMI) <30 kg·m^{−2} were included. Patients with a history of respiratory, cardiovascular or neurological disease, previous thoracic surgery, or malignancy were excluded. Information on the medical history was obtained from the medical records.

To ensure that the lung function of our sample was representative for a general population, spirometry data were compared with those of a “true” reference population. The latter consisted of healthy subjects recruited from the general population in a previous study in which our research group collaborated [22]. For this purpose, we used a subset (n=189; 48% males) of the original, multicentre population consisting of adults from Belgium or the Netherlands, *i.e.* at locations <200 km from Antwerp.

Approval of the study protocol was obtained from the Ethics Committee of the Antwerp University Hospital and the University of Antwerp (EDGE 001171, EC 20/28/383).

### Measurements

Height and weight were measured without shoes. Height was measured with a wall-mounted stadiometer (Seca type 222; Seca, Hannover, Germany). Subjects were asked to stand against a wall with feet together and the head aligned in the Frankfurt horizontal plane. Age was obtained by subtracting the visit date from the date of birth. All tests were performed on a Jaeger MasterScreen PFT (Pro) and Jaeger MasterScreen Body/Diff (RT) (Jaeger, Würzburg, Germany). The hardware equipment was renewed in December 2016. Until then, Jaeger software version Jlab 5.21 was used, which was replaced by SentrySuite version 2.19 (Vyaire, Chicago, IL, USA). After input of ambient conditions, the Lilly pneumotachographs, body plethysmographs and gas analysers were calibrated/verified each morning prior to the measurements. Lung function measurements were performed by experienced technicians according to the 2005 ATS/ERS guidelines [23–25].

Measured values of forced expiratory volume in 1 s (FEV_{1}), forced vital capacity (FVC), FEV_{1}/FVC, residual volume (RV), functional residual capacity (FRC), total lung capacity (TLC), vital capacity (VC), *D*_{LCO}, alveolar volume (*V*_{A}) and transfer coefficient of the lung for carbon monoxide (*K*_{CO}*)* were expressed as z-scores and percentage predicted according to GLI predictions. For diffusing capacity, the corrected GLI prediction equations were used as published in 2020 [21]. Analyses of the data on static lung volumes and diffusing capacity were limited to subjects <80 and <85 years of age, respectively, as GLI reference values are not available above these age limits [5, 6].

### Clinical implication of a nonoptimal prediction

If the GLI prediction is optimal, the measured values of a healthy population, expressed as z-scores, would follow a normal distribution with an average value of 0 and a standard deviation of 1. Application of the concept of the lower limit of normal (LLN), defined as a z-score of −1.64, would result in 5% of the population being labelled abnormal [3]. This can be explained by the area under the Gaussian z-score curve from negative infinity to the LLN, which constitutes 5% of the total area under the curve.

An average z-score <0 would increase the probability to find abnormality, whereas the opposite occurs with an average z-score >0. The impact of a deviating average z-score was assessed by calculating the area under the Gaussian curve from negative infinity to (−1.64−average z-score). In this way, the effect of a nonoptimal prediction was translated into a quantifiable index of the probability to detect abnormality.

### Statistical analysis

Statistical analysis was performed on the z-scores of all lung function variables. Normal distribution was tested with the Kolmogorov–Smirnov test. Mean z-scores were compared with 0 using the one-sample t-test and one-sample Wilcoxon signed-rank test.

Lung function z-scores of subjects without sleep apnoea (determined during the diagnostic sleep investigation) were compared with those of subjects with an increased apnoea–hypopnoea index (AHI ≥5 events·h^{−1}). For this purpose, a multiple linear regression analysis was performed on data in males and females separately, with adjustment for age and height.

The multiple linear regression analysis was repeated with age and height as the only explanatory variables to assess the impact of these characteristics on the lung function z-scores.

Statistical analyses were performed using SPSS version 23.0 (IBM, Armonk, NY, USA).

## Results

Between 2009 and 2020, 7974 subjects performed a complete conventional PFT at our hospital prior to a diagnostic sleep investigation. Almost half of the subjects were excluded due to obesity (see figure 1 for the inclusion flowchart). 14% of the subjects were excluded due to comorbidities possibly affecting lung function, of which patient-reported asthma in the medical history or use of asthma medication were the most frequently observed reasons. Of the remaining subjects, 56% were excluded for their smoking history ((ex-)smokers or unknown smoking history). This resulted in 1311 pulmonary healthy, nonsmoking, nonobese adults (68% males) to evaluate the appropriateness of the GLI reference equations. Data on static lung volumes and diffusing capacity were not available in 1% of the subjects.

Age of the subjects ranged between 18 and 88 years (mean±sd 48±12 years) (table 1). Height range was 1.53–2.04 m in males and 1.46–1.82 m in females. Details on the age and height distributions can be found in table 2. Mean±sd BMI was 26.3±2.4 kg·m^{−2} in males and 25.0±3.1 kg·m^{−2} in females.

The study population consisted of subjects that were tested for sleep apnoea. In 160 males and 170 females, AHI <5 events·h^{−1} was observed. Results from the regression models revealed that the z-scores of all lung function indices, except for FRC in males, were not significantly different between subjects with AHI <5 events·h^{−1} and those with AHI ≥5 events·h^{−1}. Therefore, subsequent analyses were performed on the whole population regardless of the AHI.

The z-scores of all spirometry indices of the present clinical population were compared with those of a “true” reference population, as previously recruited from a general population. The comparison was performed for males and females separately by multiple linear regression analysis after adjustment for height and age. None of the indices were significantly different between the two populations (p>0.05).

The lung function of both sexes is summarised in table 1. Except for FRC and RV, mean z-scores of all lung function indices did not differ between the sexes.

### Spirometry

Overall, the GLI prediction of FEV_{1}/FVC was accurate with a mean±sd z-score of −0.01±0.89. However, measured values exceeded the prediction in short subjects, while the opposite was observed in tall subjects (figure 2). The nature of the association between FEV_{1}/FVC z-scores and height is such that in tall subjects of both sexes, airway obstruction will be observed in twice as many subjects as expected. Likewise, in short subjects, less than half of the expected 5% will have decreased FEV_{1}/FVC ratios (figure 3). No associations were found between FEV_{1}/FVC z-scores and age.

The measured values of FEV_{1} and FVC exceeded the GLI prediction by 4% on average. The results from the regression models revealed that the FEV_{1} z-scores of both sexes were positively correlated with age, whereas an inverse correlation with height was observed in males only. A significant relationship between FVC z-scores and age was observed in males (table 3). Figure 4 illustrates the relationships between FEV_{1} % pred and age and height in both sexes.

### Static lung volumes

Although the z-scores were significantly different from 0 (table 1), average values of TLC and FRC were close to those predicted by the GLI, with mean±sd TLC and FRC values in our population of 101±10% predicted and 101±18% predicted, respectively. The largest discrepancy between measured and predicted was observed for RV, where mean±sd values of 111±20% predicted and 119±22% predicted were found for males and females, respectively.

Multiple regression analysis revealed no significant correlations between static lung volume z-scores and age or height.

### Diffusing capacity measurement

Average predictions for *D*_{LCO}, *V*_{A} and *K*_{CO} were accurate in both sexes with average z-scores between −0.01 and −0.07 (table 1). However, the results from the regression models revealed inverse correlations between *D*_{LCO}, *V*_{A} and *K*_{CO} and height in males. In females, inverse associations between *D*_{LCO} and *K*_{CO} and age were observed (table 3). Figure 5 illustrates the *D*_{LCO} z-scores in our study population in relation to height and age. The inverse associations between *D*_{LCO} z-scores and height (in males) and age (in females) are such that both in tall males and in old females, twice as many subjects as expected with diffusion disorders will be detected (figure 6).

### Prevalence of observations below the GLI LLN

Except for VC as determined by body plethysmography, the prevalence of observations below the GLI LLN threshold value was <5% for all lung function indices (table 1).

## Discussion

This study demonstrated that overall, the GLI reference equations accurately described the lung function of pulmonary healthy, Belgian adults. However, the FEV_{1}/FVC z-score correlated inversely with height in both sexes, as did the *D*_{LCO} z-score in males. The *D*_{LCO} z-score in females correlated inversely with age. Lung volume z-scores of both sexes did not correlate with height or age. The prevalence of observations below the GLI value of the LLN in our population was below the expected 5% for all relevant lung function indices.

The ATS/ERS recommends that reference equations should be evaluated in a representative sample of local, healthy subjects prior to their implementation in a lung function laboratory [3]. For this study, we recruited pulmonary healthy subjects that attended our hospital for a diagnostic sleep investigation, therefore predominantly male. Although recruited from a clinical setting, we believe that the selection did not affect our results. The average BMI was 25.9 kg·m^{−2}, which is comparable to the average BMI of the Belgium population (25.7 kg·m^{−2}) as reported in a recent Health Survey [26]. The lung function of subjects without sleep apnoea (n=330; 25% of the population) was not different from those with sleep apnoea (n=981). Additionally, the spirometry z-scores of our population were not different from those of the healthy subjects recruited from the general Belgian and Dutch population. Together, these findings strongly suggest that although our study population was recruited in a clinical setting, their lung function was not different from subjects recruited from a general population.

The average FEV_{1} and FVC values of our population exceeded the GLI prediction by 4%, with average z-scores of +0.31 and +0.29, respectively. Other studies on Caucasians of European descent have also reported larger values for FEV_{1} and FVC compared with the GLI prediction, with z-scores ranging from +0.08 to +0.42 [8–10, 16].

The observed FEV_{1}/FVC ratio in both sexes of our population was accurately predicted by the GLI (table 1). This finding is in agreement with most of the studies performed on Caucasians [8–10, 15], except for two papers that described lower measured FEV_{1}/FVC values than predicted in Scandinavian females and in Algerian subjects of both sexes [16, 27].

Overall, the GLI prediction for *D*_{LCO} was accurate for our population with an average z-score of −0.04. Of note, the recently published, corrected *D*_{LCO} reference equations [21] were used in the present study. This correction impacted especially the prediction in females, which is presently lower than the original 2017 GLI prediction [28]. A validation study in a Swedish cohort reported underestimation of the *D*_{LCO} values by the original GLI prediction [20]. The present *D*_{LCO} prediction would further increase the observed underestimation in females.

The GLI reference equations on lung volumes have only recently become available. In accordance with the findings in a small sample of Belgian adults [29], we have observed that the GLI prediction appropriately described the TLC values of our population, whereas the RV values were largely underestimated by the prediction (table 1). The underestimation of RV will result in the overdiagnosis of hyperinflation and has an impact on the selection of candidates for lung volume reduction interventions [30–32].

Although the reference equations on spirometry, diffusing capacity and lung volumes share the name “GLI”, the respective collated datasets differed greatly in size and origin [4–6]. The most recent GLI report has suggested that there is good internal consistency between dynamic and static lung volumes [6]. We have investigated this issue by comparing VC and FVC, determined by body plethysmography and spirometry, respectively. As expected in healthy subjects, absolute values were identical in our study population with a mean±sd difference of 3±151 mL. However, VC was significantly lower than predicted, whereas the opposite was observed for FVC, with average z-scores of −0.28 and +0.29, respectively (p<0.01) (table 3). Future expansion of the GLI datasets for the different PFTs might lead to a reduction in these differences and to more coherent predictions [33].

Members of the GLI Network have suggested that a goodness-of-fit between measured data and the prediction with an average residual (z-score) <|0.5| is physiologically or clinically not meaningful [8, 34, 35]. Quanjer *et al*. [34] have indicated that such a difference between data and prediction can be expected when evaluating a reference population comprising >150 and <1000 subjects. However, we do question the threshold value of |0.5| for a clinically relevant difference. With the use of our simulation model, we demonstrated that average z-scores of −0.32 and +0.36 theoretically lead to twice as many or half of the observations being below the LLN, respectively, instead of the 5% expected.

In the local validation of reference equations, a deviating average z-score>|0.3| is of particular importance for the FEV_{1}/FVC ratio, TLC and *D*_{LCO} since low values denote airway obstruction, restriction and diffusion disorders, respectively. Our analyses indicate that airway obstruction will be underdiagnosed in short subjects and overdiagnosed in tall subjects (figure 3), whereas diffusion disorders will be overdiagnosed in old females and in tall males (figure 6). Our population was too small to detect reliable percentiles in subgroups of extreme height or age categories. However, Backman *et al*. [16] recently reported an average z-score of −0.38 for FEV_{1}/FVC in their population of 244 females and detected almost twice the expected rate of airway obstruction (9.4% *versus* 5%). This observation is in line with our theoretical model and supports its conclusion that an average deviating z-score of 0.3 may lead to a significant, clinical difference in the detection of disease.

When the average prediction is accurate but the distribution of residuals is smaller than expected, <5% of the observations will be found below the LLN as defined by the GLI threshold. Overall, the scatter of residuals in our study was ≤1, whereas the average z-scores were close to 0. This resulted in a <5% prevalence of airway obstruction, restrictive and diffusion disorders, respectively, in our study population (table 3). Other validation studies have also reported <5% of the observations below the LLN for FEV_{1}/VC [10, 20]. According to Quanjer *et al*. [34], the reduced scatter in validation studies can be attributed to a limited sample size. However, it might also result from our use of only one hardware measurement system. Another explanation is a more homogeneous composition of a local population compared with the GLI reference population. Greater variability in lung function of the GLI reference population leads to a wider scatter in the normal values and thus to a reduced value of LLN. A study in a Swedish cohort identified more subjects with respiratory burden by using a locally derived LLN for *D*_{LCO} compared with the lower GLI threshold value of LLN [20]. Additionally, a strikingly low prevalence of abnormal lung function results was recently reported in a large Swiss population study using the GLI [36]. Further studies on patient outcomes are needed to provide a definitive answer on the validity of the GLI threshold value of LLN.

The GLI reference equations provide an all-age, global standard to report and interpret lung function. Our results indicate that the GLI reference values are acceptable for the Antwerp population. Since using a global standard enables comparison between centres, we prefer using these standards above local normal values [37]. Our study also indicates that GLI equations do not perfectly fit the lung function of our population, especially at height and age extremes. Future expansion of the GLI datasets might refine the GLI predictions so that the previously and presently observed imperfections are resolved [33, 38, 39].

We conclude that in general the GLI reference equations for spirometry, lung volumes and diffusing capacity provide an adequate description of the lung function on the adult Antwerp population. However, the prediction is less accurate at age and height extremes. The GLI LLN to detect respiratory disorders needs to be interpreted with caution in our population.

## Footnotes

Provenance: Submitted article, peer reviewed.

Conflict of interest: J. Verbraecken is an Associate Editor of this journal. The other authors have no conflict of interest to disclose in relation to the submitted work.

- Received November 29, 2021.
- Accepted March 29, 2022.

- Copyright ©The authors 2022

This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org