Abstract
Background Tidal flow–volume (TFV) loops are commonly recorded in infants during sleep, due to the more regular breathing patterns compared to the awake state. Standardised deselection of loops outside pre-specified ranges are based on periods of regular breathing, while criteria and available software for visual evaluation of TFV loops are lacking. We aimed to determine the reliability of standardised criteria for manual selection of infant TFV loops.
Methods Using a pre-defined set of criteria, three independent raters manually evaluated TFV loops among 57 randomly selected awake healthy 3-month-old infants with available TFV measurements in the Scandinavian Preventing Atopic Dermatitis and ALLergies in children (PreventADALL) study. The TFV loops were sampled using the Eco Medics Exhalyzer D. Criteria for selecting TFV loops included reproducible shape and volume with only one peak in tidal expiratory flow (PTEF), excluding loops with no clear or uneven flow towards PTEF. By intraclass coefficient (ICC), the reliability of agreement between raters was determined for the time to PTEF (tPTEF) to expiratory time (tE) and other TFV loop parameters.
Results Five infants had unsuccessful tests. Among the remaining 52 infants, the raters selected a median of 25, 26 and 15 loops per test. The ICCs (95% CI) were 0.97 (0.92–0.98) for tPTEF/tE, 0.99 (0.99–1.00) for respiratory rate, 0.98 (0.97–0.99) for tidal volume per kg and 0.98 (0.97–0.99) for expiratory volume, reflecting excellent agreement in all categories.
Conclusion Manual TFV loop selection using standardised criteria provides a reliable alternative for lung function measures in awake infants with interrupted breathing cycles in a real-life setting.
Abstract
A pre-defined set of inclusion and exclusion criteria provide a reliable and standardised procedure for manual selection of tidal flow–volume loops in infants, which may be useful in clinical and research settings https://bit.ly/3NYMlp0
Introduction
Infant lung function testing has been used to assess lung development and the impact of environmental factors and to detect lung disease. By tracking through childhood and adolescence, lung function in infancy is a major predictor of adult lung function [1–3]. Measures of lung function in awake young children include tidal breath flow–volume loops [4], representing compound measurements of lung function, including size of airways, mechanical characteristics of the lung [5] and respiratory control [6]. During tidal flow–volume (TFV) loop sampling, abnormal patterns of breathing and airway obstruction may be exposed [7]. TFV loop measurements correlate with forced expiratory measurements [8, 9], and lower values of ratio of time to peak tidal expiratory flow (tPTEF) to expiratory time (tE) in infancy are associated with chronic lung disease, wheeze in infancy and asthma later in life [10–14].
Tidal breathing measures have been obtained in awake and naturally sleeping infants and children, sometimes under sedation [15]. While it is possible to obtain lung function tests from awake preschool children actively participating, historically, it has been challenging to assess lung function in infants without sedation or during natural sleep. Commonly, chloral hydrate has been used as a sedative [16] in inpatient and outpatient facilities [17]; a drug negatively affecting normal ventilation [18], associated with several cases of overdosing, respiratory depression, cardiopulmonary arrest and fatal events [17]. As TFV loops often are more easily obtained during sleep, and breathing cycles are less likely to be interrupted, measurements during sleep has been preferred in infants and young children [8, 19]. However, measurements in the awake state may be advantageous as children are more likely to be awake than sleeping at clinical investigations. Furthermore, lung function in older children is measured in the awake state, and measures obtained in awake compared to sleeping infants may be less influenced by external factors [20]. Associations between tPTEF/tE and maternal smoking in utero, and future asthma are observed in both awake [11, 21] and sleeping states [22, 23]. The clinical value of TFV measures on an individual level is debated [24, 25], partly due to the lack of reference values [26]. Guidelines for TFV measures are established [26], largely based on examination of sleeping or sedated children [19], while sources of variability and criteria for selection of loops are unclear. In the commonly used software, the only option for automatic selection of loops is by pre-defining a threshold for maximum deviation of millilitres from the median tidal volume (VT).
The American Thoracic Society (ATS)/European Respiratory Society (ERS) guidelines state that automatic breath detection should be accompanied by a visual evaluation of the flow and volume signals; however, there is no clear consensus on criteria for manual inclusion or exclusion of flow–volume loops in a test [26]. There is a need for a validated standard operating procedure, with clear criteria for inclusion and exclusion of TFV loops, for use in clinical as well as research settings, regardless of arousal state. Therefore, the aim of the present study was to determine the reliability of a pre-defined set of criteria for manual selection of TFV loops in infants.
Material and methods
Study subjects
Three independent raters evaluated TFV loop measures in 57 randomly selected infants with available lung function at 3 months of age, antenatally enrolled in the general population-based prospective mother–child birth cohort study Preventing Atopic Dermatitis and ALLergies in children (PreventADALL) [27].
Two raters from Oslo University Hospital and one from Karolinska University Hospital had access to and evaluated TFV measures stored in a secure data server at the University of Oslo. All three raters were medical doctors with clinical experience from general paediatric medicine under training within paediatric pulmonology. Two raters (K.E.S. Bains and H.K. Gudmundsdóttir) both performed infant lung function testing and evaluated the TFV loops, while the third (E. Amnö) participated in the loop evaluation and selection process only.
The PreventADALL study recruited 2697 pregnant women from Norway (Oslo University Hospital and Østfold Hospital Trust) and Sweden (the region of Stockholm) at ∼18 weeks of pregnancy from December 2014 to October 2016, and their healthy infants born at or after gestational week 35.0. In the present study, the source population consisted of healthy, awake infants with lung function measurements obtained by study personnel in Oslo.
The PreventADALL study was approved by the regional committee for medical and health research ethics in South-Eastern Norway (2014/518) and Sweden (2014/2242-31-4). The study was registered at ClinicalTrials.gov (NCT02449850). Written informed consent was obtained from the pregnant women at enrolment and from parents at inclusion of the newborn infants.
Procedures
Trained study personnel measured lung function in awake infants at the follow-up examination at 3 months of age. Infants were calm and positioned supine in either a stroller or on a firm pillow on caregiver's arm or lap, with head and neck in midline. The TFV loops were sampled using the Eco Medics Exhalyzer D (Duernten, Switzerland) with ultrasonic flowmeter attached to a carbon dioxide adapter and a dead space reducer (Set 1) with Spirette (Eco Medics) and a tight-fitting face mask with inflated cuff covering nose and mouth to avoid air leaks. The equipment was calibrated daily for atmospheric pressure, temperature and channel, whereas flow calibration was executed between every subject. Analyses with the Spiroware software version 3.2.1 were in line with international guidelines on infant lung function testing [15, 26]. A test run constituted of consecutive TFV loops to a maximum of 100 loops, as defined as cut-off in the software. Further details on lung function testing are outlined in supplementary material 1.
A set of pre-defined criteria was outlined in a standard operating procedure for manual selection of TFV loops, developed by the raters together with senior researchers in the field of tidal breathing measurements. Details of the standard operating procedure are given in supplementary material 2, with the main criteria illustrated in figures 1 and 2. Briefly, the loops should be reproducible with fairly even shape and similar volumes with only one peak on expiratory flow, while allowing some normal variation as expected in a healthy child. Both consecutive and nonconsecutive breaths were saved when deemed reproducible and with little deviation of volumes from mean VT during expiration or inspiration. Explicit criteria for exclusion of loops were no clear peak tidal expiratory flow (PTEF) or loops with an aborted or uneven flow towards PTEF at the beginning of the expiratory phase. Each lung function test was eventually rated into one of three pre-defined quality categories: successful, partly successful and not successful. A successful test was defined as a test with good reproducible quality and included preferably ≥10 loops. A partly successful test was defined as a test including fewer accepted loops or where reproducibility in loop shape or selected variables was uncertain, showing a greater variance in between the concluded loops. Tests that were not successful were of poor quality, with uncertainty whether the loops represented the infants’ normal breathing or included no saved loops.
Criteria for inclusion and exclusion of loops in a test. VT: tidal volume; tE: expiratory time; tI: inspiratory time; PTEF: peak tidal expiratory flow.
Tidal flow–volume loops. a) Examples of excluded loops (red) due to deviating volume (left), notch on expiratory flow (middle) and two expiratory flow peaks (right); b) tidal flow–volume test before (left) and after (right) manual selection of loops. tPTEF: time to peak tidal expiratory flow; tE: expiratory time.
All three raters independently evaluated, selected and rated all TFV loops sampled in 57 infants and recorded the observations electronically. The infants were randomly selected from a list of all infants attending the 3-month follow-up visit at Oslo University Hospital, using random sampling in SPSS (IBM, Chicago, IL, USA). Each rater independently worked successively through the list to identify infants with lung function measures in the awake state and then evaluated the available loops to classify and qualify each test run. Thereafter, each rater independently deemed the test successful, partly successful or not successful, based upon a general evaluation of the test in relation to the criteria in the standard operating procedure (figure 1 and supplementary material 2). Samples were scrutinised and stored within a safe storage at the Service for Sensitive Data unit at the University of Oslo [27].
For comparison of lung function measurement variables between manual and automatic selection of loops, system settings used for automatic selection of loops were set to standard from the manufacturer as described in supplementary material 2. Results of software selected loops were reported electronically without any manual correction.
Definitions and outcomes
The primary outcome was the level of agreement among different raters of tPTEF/tE, and secondary outcomes were the agreement between the raters of tPTEF/tE categories <0.20, 0.20<0.25 and ≥0.25, as well as respiratory rate, tidal volume per kilogram (VT·kg−1) and expiratory volume (VE).
Statistical analysis
Descriptive statistics are presented in numbers and proportions for categorical variables and mean or median with standard deviation or minimum and maximum for continuous variables.
For the reliability analysis of the continuous variables, we calculated the intraclass correlation coefficient (ICC) with a two-way random-effects model [28] as we were evaluating a rater-based clinical assessment method where the three raters had similar characteristics, and we planned to generalise our reliability results to other raters. The analysis was based on the single rater type, with the reliability experiment in this article comparing the actual rating of three independent raters. Absolute agreement for the outcomes was assessed by calculating ICC estimates and their 95% confidence intervals using SPSS (version 25). ICC values >0.90 indicate excellent reliability, values between 0.75 and 0.90 good reliability and values between 0.50 and 0.75 indicate moderate reliability [28]. The P0 (the null value of the ICC) was set to 0.6.
Agreement between raters for categorical variables are reported descriptively. We investigated whether the number of loops selected differed between raters using linear regression and calculating robust standard errors to adjust for the cluster “infant”. This analysis was performed in STATA (version 17.0) and included the 156 tests included in the main analysis. To have a statistical power of 80% to detect significant agreement >0.74, with an α-level of 5% and three raters, the study required TFV loop tests from 53 subjects.
Results
The 57 infants (63% boys) had a mean (range) gestational age of 39.8 (36.6–42.9) weeks and weight at 3 months of age of 6.2 (4.6–8.2) kg (table 1). In five infants no TFV loops were saved after rater assessment by at least one rater and the lung function measurements of these infants were thus not included in the ICC analysis (supplementary table S1). The median number of loops saved by each rater per test among the 52 infants was 25, 26 and 15, respectively (table 2), while the software selected a median of eight loops per test (table 3). In the tests concluded automatic by the software, there were examples of loops with aborted flows and double PTEF. Rater 2 selected on average 2.3 loops more than rater 1 (coefficient 2.3, 95% CI 0.3–4.3; p<0.001) and rater 3 selected on average 7.6 loops less than rater 1 (coefficient −7.6, 95% CI −9.4–−5.8; p<0.001).
Characteristics for the 57 infants with tidal flow–volume measures assessed by three independent raters
The number of loops evaluated per rater in the 52 tests that all three raters found appropriate for further analyses, and tests including ≥10 loops
Number of loops and ratio of time to peak tidal expiratory flow (tPTEF) to expiratory time (tE), respiratory rate, tidal volume per kilogram (VT·kg−1) and expiratory volume (VE) per rater for infants where data from tests were saved by all three raters, all infants where data were saved by one rater and data selected by the software
The mean±sd ratios of tPTEF/tE were 0.39±0.08, 0.41±0.08 and 0.39±0.09 for the three raters (table 3), with an ICC of 0.97 (95% CI 0.92–0.98 (figure 3a). The corresponding mean±sd ratio of tPTEF/tE selected by the software was 0.52±0.22 (table 3).
Rater agreement with the mean intraclass correlation (ICC) (95% CI) for a) ratio of time to peak tidal expiratory flow (tPTEF)/expiratory time (tE), b) respiratory rate, c) tidal volume (VT) and d) expiratory volume (VE). Data are presented as individual mean±sd.
The ICCs for respiratory rate, VT·kg−1 and VE were 0.99 (95% CI 0.99–1.0), 0.98 (95% CI 0.97–0.99) and 0.98 (95% CI 0.97–0.99), respectively (figure 3b–d). The ICCs for tPTEF/tE, respiratory rate, VT·kg−1 and VE for tests including ≥10 loops by each rater (n=37) were similar (supplementary table S2).
None of the 52 infants had a tPTEF/tE ratio <0.20, while two infants had a ratio <0.25 as reported by either one or two of the raters. All other infants (96.2%) had a tPTEF/tE ratio ≥0.25 reported by all raters.
All three raters agreed on the quality category in 41 (72%) out of the 57 infants, with three of these deemed as not successful by all (table 4). Selected TFV measurement parameters for infants where one, two or three raters disagreed on the quality of the test are listed in table 5.
Conclusion by raters on quality of tests from all 57 infants
An overview of all infants with rater disagreement on the quality of the test
Discussion
The reliability of a set of pre-defined criteria for manual selection of TFV parameters among healthy 3-month-old infants was excellent, with an ICC of 0.97 (95% CI 0.92–0.98) for the tPTEF/tE ratio between three independent raters. Likewise, the ICC was higher than 0.90 for respiratory rate, VT·kg−1 and VE.
To the best of our knowledge, this is the first study to validate a set of pre-defined criteria for individually evaluating TFV loops in awake infants, showing excellent agreement between three independent raters on the tPTEF/tE ratio, despite varying number of loops approved by each rater. In 25 sleeping infants in the first week of life, Yuksel et al. [25] reported good interobserver repeatability between two observers of tPTEF/tE measured in a whole-body plethysmograph, using the method of Bland and Altman.
While none of the infants had a tPTEF/tE <0.20, two infants were rated to have a tPTEF/tE between 0.20 and 0.25 by one or two raters, while all raters agreed on a tPTEF/tE ratio of ≥0.25 in 52 (96.3%) out of the 54 infants. A cut-off value of tPTEF/tE ratio <0.20 is associated with later bronchial obstruction [11, 29], whereas ratio values of ≥0.25 have been regarded normal [8, 14, 25, 29, 30]. While low tPTEF/tE appears clinically relevant, the PreventADALL study is based on a normal population and we assumed, as observed, that the majority of included infants would have lung function values in the normal range. Therefore, we categorised the tPTEF/tE ratio into three categories of low (<0.20), marginal (0.20<0.25) and normal (≥0.25) values for comparison, in addition to the exact values included in the ICC analyses.
The ICC for tPTEF/tE was consistent across a varying number of tidal breathing loops, with the three raters approving a median of 15, 25 and 26 loops, where on average rater 2 saved more and rater 3 saved fewer loops per test than rater 1. The automatic selection of loops by the software in general resulted in fewer loops, with a smaller range, higher mean ratio of the tPTEF/tE, lower mean VT·kg−1 and VE and similar mean respiratory rate as compared to the three raters. The large discrepancy indicates that faulty loops such as double peaks and irregular shaped loops were not deselected by the automatic process. We therefore suggest that manually selected loops are more likely to be representative and of higher quality than are loops selected automatically. Stocks et al. [19] suggested in 1994 that for infants aged >6 weeks, 10 breath loops might be adequate, whereas in younger infants a tPTEF/tE based on the mean of 15–20 loops would be a closer estimate to their true value due to decreased within-subject variability with increased age. However, a study has documented tidal flow–volume loop indices based on only four loops selected from a preview of eight loops, due to data storage capacity at the time [31]. The ATS/ERS guidelines on pulmonary function testing for preschool children suggest that a reliable tPTEF/tE should be based on ≥10 loops [15] and these criteria are widely used for TFV measurements in infants as well. However, based on our results, including varying number of included loops by the raters, as shown in table 2, we suggest that tests may be valid even with <10 loops.
We included tests deemed successful, partly successful and not successful and found a consistent ICC of excellent agreement in both tPTEF/tE, as well as respiratory rate, VT·kg−1 and VE. Defining the quality of a single test was based on the visual shape and reproducibility of the loops, after manual removal of loops with poor technical quality or without a well-defined PTEF. There are no clear-cut criteria for evaluation of the quality of the tests; however, the criteria are outlined in the standard operating procedure for lung function analysis in the study provided in online supplement 2. This can explain why the raters deemed tests as being of different quality categories and saved different numbers of loops, where rater 3 on a general basis saved fewer loops from the tests and in general deemed more tests partly successful compared to the other two raters. Despite these discrepancies, the ICC is high for the continuous variables.
The criteria for manual selection of TFV curves in a real-life setting provided sufficiently robust criteria for excellent agreement on the tests, supporting the usefulness of the criteria.
We are not aware of other studies that compare lung function variables from awake TFV measurements manually evaluated by several independent raters.
Strengths and limitations
The TFV tests were performed on healthy awake infants with characteristics reflecting a normal population under standardised circumstances by trained personnel [5, 24]. The number of infants was pre-defined by power calculations to include sufficient numbers of tests. The study requirement of 53 infants to ensure a statistical power of 80% to detect significant agreement >0.74 was not met; however, with 52 infants included in calculations resulting in an ICC >0.90 for all variables, it is unlikely that including one more infant would affect the outcome.
There was little variance in the pre-defined categories of ratios of <0.20, 0.20<0.25 and ≥0.25 defined to distinguish an assumed healthy infant from an infant with reduced lung function. The category based on tPTEF/tE 0.20<0.25 is somewhat arbitrary, but was pre-defined as being in the lower range of presumably normal TFV loops. The high ICC, reflecting excellent agreement between different raters, was evident for the continuous variables in all outcomes.
The present study provides a further step to standardise TFV measures in epidemiologic studies and clinical practice, in line with the need for further insight into lung function measurement techniques, allowing repeated measurements in awake young children [4]. It remains unclear if selecting loops by this method will be useful in the clinical practice, and further studies should be conducted to validate use for long-term care of patients.
Conclusion
Using a set of pre-defined selection criteria, manual selection of TFV loops from healthy awake 3-month-old infants resulted in excellent agreement of TFV parameters between three independent raters. Our study provides a feasible and valid tool for selecting TFV measures in infants, which may particularly be useful in the absence of long sequences of regular breathing, such as in daily clinical practice.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00165-2022.SUPPLEMENT
Acknowledgements
We sincerely thank all families participating in the PreventADALL (Preventing Atopic Dermatitis and Allergies in Children) study as well as study personnel who have contributed to recruiting participants, performing lung function measurements and other relevant examinations, and in generally managing the study: Mari Kjendsli, Ingvild Essén, Malén Gudbrandsgard (Division of Paediatric and Adolescent Medicine, Oslo University Hospital, Oslo, Norway); Live S. Nordhagen (Division of Paediatric and Adolescent Medicine, Oslo University Hospital and VID Specialized University, Oslo, Norway); Sandra Götberg, Kajsa Sedergren, Natasha Sedergren, Caroline-Aleksi Olsson Mägi, Sandra G. Tedner and Ellen Tegnerud (Astrid Lindgren Children's Hospital, Karolinska University Hospital and Dept of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden).
Footnotes
Provenance: Submitted article, peer reviewed.
The study was performed within ORAACLE (the Oslo Research Group of Asthma and Allergy in Childhood; the Lung and Environment).
This study is registered at www.clinicaltrials.gov with identifier number NCT02449850.
Author contributions: All authors have contributed substantially to the design and/or data collection and/or clinical follow-up of the PreventADALL study, revised the work critically for important intellectual content and approved the final version before submission.
Conflict of interest: E.M. Rehbinder has received honoraria for lectures from Sanofi-Genzyme, Novartis, Leo-Pharma, Perrigo and The Norwegian Asthma and Allergy Association. The other authors have nothing to disclose.
Support statement: The PreventADALL study has been funded by The Regional Health Board South East, The Norwegian Research Council, Oslo University Hospital, the University of Oslo, Health and Rehabilitation Norway, the Foundation for Healthcare and Allergy Research in Sweden (Vårdalstiftelsen), the Swedish Asthma and Allergy Association's Research Foundation, Swedish Research Council – the Initiative for Clinical Therapy Research, The Swedish Heart–Lung Foundation, SFO-V Karolinska Institutet, Østfold Hospital Trust, the European Union (MeDALL project), by unrestricted grants from the Norwegian Association of Asthma and Allergy, the Kloster FOUNDATION, Thermo-Fisher Uppsala, Sweden, by supplying allergen reagents, the Norwegian Society of Dermatology and Venerology, Arne Ingel Legat. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received April 2, 2022.
- Accepted July 3, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org