Abstract
Making chILD diagnoses on CT is poorly reproducible, even amongst sub-specialists. CT might best improve diagnostic confidence in a multidisciplinary team setting when augmented with clinical, functional and haematological results. http://bit.ly/327jRCw
To the Editor:
Interstitial lung diseases (ILDs) that present in childhood (chILD) are seen far less frequently than ILDs presenting in adults which themselves constitute rare disorders [1]. Histopathological [2, 3] and imaging [4] characterisation of chILD disease subtypes therefore lags behind adult ILDs. The field has also been constrained by comparisons with disease morphology in adults, despite the developmental differences in terms of growth and healing in the paediatric lung, which may alter disease patterns and distributions. The American Thoracic Society [5] and European [1] chILD management guidelines both specify a pivotal role for computed tomography (CT) imaging in the work-up of chILD patients to: 1) determine whether a chILD is present or not; and 2) where possible, to make a specific diagnosis of the underlying cause. For the second aim to be achieved, diagnostic reviews need to be reproducible between experts. Our study uniquely examined agreement between observers of varying experience in the CT evaluation of chILD to inform whether the current status of CT imaging and knowledge can be diagnostic of specific chILDs. We hypothesised that observer agreement for chILD groups and diagnoses would be limited. The study was not designed to relate CT agreement to final diagnosis. As a secondary analysis, we examined how CT interpretation differed between observers in children under and over 2 years of age.
84 patients (<2 years, n=35) from 13 countries, referred to the Royal Brompton Hospital paediatric ILD unit from 1999 to 2014 were re-evaluated by 10 observers: three chILD sub-specialist radiologists with >25 years' experience (C.M. Owens, P. Garcia-Peña, A.S. Brody); four chILD pulmonologists with >20 years' experience (S. Cunningham, T.J. Vece, P. Aurora, A. Moreno-Galdó); and three radiologists with 5–10 years' chILD experience (A. Calder, P. Toma, T.A. Watson). Consecutive patients were chosen and included in the study if clinically indicated DICOM format CTs were retrievable and of acceptable quality (high-resolution, ≤2 mm slice thickness) as determined by an independent radiologist (J. Jacob). Observers were given details of patient sex and age. Each scorer provided up to three choices when assigning a chILD diagnostic group or individual chILD diagnosis from a preselected list. For each choice, a measure of group/diagnostic likelihood was also assigned with confidence scores ranging from 0% to 100%: 100%=pathognomonic, 70–90%=high confidence, 40–60%=moderate confidence, 10–30%=low confidence, 0%=no confidence. The choices were based on previous histopathological work [2].
The Fleiss Kappa evaluated observer agreement for first-choice chILD groups/individual diagnoses. For chILD groups/individual diagnoses where agreement was at least fair, observer confidence was examined using linear weighted kappas [6]. Results were reanalysed according to patient age (<2 years versus >2 years, the conventional dichotomy in chILD classifications [7]) given the variable prevalence of chILD disease groups/individual diagnoses across these age ranges. Agreement was also examined in observer subgroups (senior versus junior radiologists versus senior pulmonologists). Statistical analyses were performed with R: A language and environment for statistical computing [8]. Approval for the study was obtained from the Institutional Research Committee of the Royal Brompton Hospital (Project 1157).
840 first-choice chILD group/diagnostic assignations were made by the 10 observers (table 1). chILD groups/diagnoses whose frequency was <10% of the total number of first-choice diagnoses in patients under and over 2 years of age, were grouped as “others”, in accord with previous studies [9]. Four chILD groups were analysed: airways disease (n=247), interstitial pneumonia (n=326), developmental/undefined aetiology disorders (n=147) and others (n=120). Five chILD diagnoses could be analysed in patients <2 years: infection (n=48), chronic pneumonitis of infancy (CPI; n=47), pulmonary alveolar proteinosis (PAP, n=53), developmental/undefined aetiology disorders (n=83) and others (n=119). Four chILD diagnoses could be analysed in patients >2 years: obliterative bronchiolitis (n=48), infection (n=86), fibrotic nonspecific interstitial pneumonia (fNSIP; n=47) and others (n=309).
Assignations of first-choice individual childhood interstitial lung disease (chILD) group or diagnosis in children under 2 years of age (n=35) and over 2 years of age (n=49) by 10 observers
When subanalysed according to patient age, observer agreement for first-choice chILD groups was moderate for airways disease and interstitial pneumonia in patients >2 years, but only fair in patients <2 years (table 2). There was generally less observer variation between senior radiologists than pulmonologists. In patients <2 years senior radiologists demonstrated better agreement than other observers. High confidence chILD group assignations (>70%) were more common in children >2 years and increased with observer seniority (data not shown). Encouragingly, agreement for diagnostic groups was best for infection/airways disease where conservative management without biopsy may be preferable.
Fleiss kappa values for observer agreement for first choice group assignation in the 84 study cases. Sub-analyses are shown for patients under 2 years of age (n=35) and over 2 years of age (n=49)
Observer agreement in assigning first-choice individual chILD diagnoses was moderate for obliterative bronchiolitis (in children >2 years), and fair for other diagnoses except CPI where agreement was poor. Further analyses were only performed in individual diagnoses where agreement was at least fair. Observer agreement for diagnostic likelihoods were moderate for obliterative bronchiolitis (children >2 years of age), but were at best only fair for other diagnoses, regardless of patient age (data not shown). Observer agreement for diagnostic likelihood of infection in children <2 years was poor across all observer subgroups.
We report poor agreement between even experienced thoracic radiologists for the identification of most chILD groups and individual diagnoses across ages. The lower observer agreement for individual chILD diagnoses compared with chILD diagnostic groups may reflect differing methods by which radiologists classify CT appearances. Some observers might classify according to histological pattern and distinguish features of NSIP from PAP in a child >2 years, while others might classify according to suspected aetiology and recognise both patterns as suggesting a surfactant protein disorder. Our findings stress the need for a new common uniform descriptor language for CT appearances, because the present system is not reproducible and therefore confusing.
The current classification system in chILD is based on histopathological diagnoses [2, 7], and as occurred with adult fibrosing lung diseases, it is hoped that histopathological descriptors will inform those CT features that are of importance in delineating childhood disease. Two early studies examining diagnostic accuracy using radiological–histopathological correlations in a small subset of chILD patients demonstrated that CT features could identify histopathological diagnoses with acceptable accuracy [4, 10]. However, prior to the evaluation of CT diagnostic accuracy it is essential to understand whether expert readers can coalesce on a single diagnosis with sufficient frequency that a single CT diagnosis can be compared to a single histopathological diagnosis.
Study limitations included the time frame over which cases were acquired, where CT imaging protocols and quality varied widely. However, imaging heterogeneity is an unavoidable constraint when examining real-world paediatric data and we actively sought to draw real-world conclusions on chILD observer agreement. Limiting clinical information to patient age and sex might have increased observer disagreement. However rather than examining diagnostic accuracy, our a priori study aim was to understand the limitations of CT interrogation of chILD and identify those conditions where pattern recognition alone is insufficient to make a confident diagnosis. Though the study case mix from a tertiary centre is likely to have been weighted towards more challenging presentations of disease, chILD cases are only seen with sufficient frequency to allow analysis in such centres.
Our study is the largest examination of CT imaging in chILD cases, scored by the largest number of observers, to date, who importantly were from different specialties. We demonstrate that making chILD diagnoses on CT is difficult, even amongst sub-specialists. Considering the current state of knowledge in the field, we found that agreement for chILD diagnostic groups was generally better than for individual diagnoses and was highest amongst senior radiologists. Further detailed work is needed to understand if, with studies and training, improvements in CT diagnostic agreement are possible. Additionally, the ability of the multidisciplinary team to improve diagnostic confidence by augmenting radiology with clinical, functional and haematological results, as per guideline recommendations [1, 5] requires further study.
Footnotes
Author contributions: JJ, CMO, ASB, TS, TAW, AC, PG-P, PT, AM-G, PA, AD, HW, TJV, SC, AA, AUW, AGN, AR, AB were involved in either the acquisition, or analysis or interpretation of data for the study. JJ, CMO and AB were also involved in the conception and design of the study. All authors revised the work for important intellectual content and gave final approval for the version to be published. All authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Conflict of interest: J. Jacob reports personal fees from Boehringer Ingelheim (advisory board fees) and Roche, outside the submitted work.
Conflict of interest: C.M. Owens reports personal fees from Boehringer Ingelheim, outside the submitted work.
Conflict of interest: A.S. Brody reports personal fees from Vertex, outside the submitted work.
Conflict of interest: T. Semple reports personal fees from Vertex, outside the submitted work.
Conflict of interest: T.A. Watson has nothing to disclose.
Conflict of interest: A. Calder has nothing to disclose.
Conflict of interest: P. Garcia-Peña has nothing to disclose.
Conflict of interest: P. Toma has nothing to disclose.
Conflict of interest: A. Devaraj reports personal fees from Boehringer Ingelheim, Roche, and GSK, outside the submitted work.
Conflict of interest: H. Walton has nothing to disclose.
Conflict of interest: A. Moreno-Galdó reports personal fees from Abbvie, Actelion, and Novartis, outside the submitted work.
Conflict of interest: P. Aurora has nothing to disclose.
Conflict of interest: A. Rice reports personal fees from AbbVie, outside the submitted work.
Conflict of interest: T.J. Vece has nothing to disclose.
Conflict of interest: S. Cunningham reports personal fees from Boehringer Ingelheim, outside the submitted work.
Conflict of interest: A. Altmann has nothing to disclose.
Conflict of interest: A.U. Wells reports personal fees from Intermune (advisory board and speaker fees), Boehringer Inlgeheim (advisory board and speaker fees), Gilead (advisory board fees), MSD (advisory board fees), Roche (advisory board and speaker fees), Bayer (advisory board and speaker fees), and Chiesi (speaker fees), outside the submitted work.
Conflict of interest: A.G. Nicholson reports personal fees from Boehringer Ingelheim (advisory board fees), Roche, Medical Quantitative Image analysis, and Galapagos, outside the submitted work.
Conflict of interest: A. Bush has nothing to disclose.
Support statement: Joseph Jacob was supported by Wellcome Trust Clinical Research Career Development Fellowship 209553/Z/17/Z. Andrew Bush is an Emeritus NIHR Senior Investigator and is supported by chILD-EU (FP7, No: 305653) and the European Cooperation in Science and Technology COST A16125. Andre Altmann holds an MRC eMedLab Medical Bioinformatics Career Development Fellowship. This work was supported by the Medical Research Council (grant number MR/L016311/1). Funding information for this article has been deposited with the Crossref Funder Registry.
- Received April 20, 2019.
- Accepted June 6, 2019.
- Copyright ©ERS 2019
This article is open access and distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.