Abstract
Objectives Disease-specific, well-defined and validated clinical outcome measures are essential in designing research studies. Poorly defined outcome measures hamper pooling of data and comparisons between studies. We aimed to identify and describe pulmonary outcome measures that could be used for follow-up of patients with primary ciliary dyskinesia (PCD).
Methods We conducted a scoping review by systematically searching MEDLINE, Embase and the Cochrane Database of Systematic Reviews online databases for studies published from 1996 to 2020 that included ≥10 PCD adult and/or paediatric patients.
Results We included 102 studies (7289 patients). 83 studies reported on spirometry, 11 on body plethysmography, 15 on multiple-breath washout, 36 on high-resolution computed tomography (HRCT), 57 on microbiology and 17 on health-related quality of life. Measurement and reporting of outcomes varied considerably between studies (e.g. different scoring systems for chest HRCT scans). Additionally, definitions of outcome measures varied (e.g. definition of chronic colonisation by respiratory pathogen), impeding direct comparisons of results.
Conclusions This review highlights the need for standardisation of measurements and reporting of outcome measures to enable comparisons between studies. Defining a core set of clinical outcome measures is necessary to ensure reproducibility of results and for use in future trials and prospective cohorts.
Abstract
Measurement and reporting of lower airway outcome measures in primary ciliary dyskinesia research are not standardised. Validated, disease-specific clinical outcomes are needed to monitor disease progression in future trials and prospective cohorts. https://bit.ly/3yHERQm
Introduction
Primary ciliary dyskinesia (PCD) is a rare, genetic, multisystem disease that affects motile cilia lining the upper and lower airways, and the eustachian and fallopian tubes [1, 2]. Symptoms start in early infancy, with progressive suppurative lung disease invariably leading to bronchiectasis [3, 4]. Current management of patients with PCD broadly follows that used for patients with cystic fibrosis (CF) and bronchiectasis (formally non-CF bronchiectasis) [5–8]. Therefore, studies have adopted similar outcome measures to monitor the natural history and disease progression in PCD, even though PCD has a unique pathophysiology and disease pattern [9].
There is no minimum core set of disease-specific outcome measures in PCD research. This is particularly problematic because the choice of outcome measures informs the selection of data sources from which study information can be collected; the appropriateness, frequency and length of follow-up measurements; and the required number of patients. Appropriate sample size relies on the expected frequency and natural variability of outcomes, and on the effect of interest (or the minimal clinically important difference) [10]. The quality of the knowledge generated by research strongly relies on the selection of appropriate outcomes.
Well-defined and validated disease-specific outcome measures are the most efficient and accurate way to assess new therapies and management options. While true for all diseases, this is particularly poignant for rare diseases, where the number of patients available is limited [11]. An outcome measure that is valid for another disease might not be appropriate to measure the effect of interest, or sensitive enough to detect a subtle effect. For example, spirometry is routinely used to monitor disease progression, but is thought to be an insensitive surrogate marker for early lung disease in CF and, more recently, in PCD [12–15].
The aim of this scoping review was to systematically identify and describe the evidence in this area. In addition, we aimed to highlight the most commonly used pulmonary and related outcomes and to examine the consistency of definitions across studies and the variations on the use and reporting of clinical outcome measures in the PCD literature.
Methodology
Search strategy
We followed the a priori scoping review protocol, which is available from the authors upon request. A pilot search included only terms related to the disease (items 1–4 of search terms, supplementary box 1) and one reviewer (BR) scanned the first 1000 abstracts to identify key terms that could be used to build the full search strategy, designed for use in Embase and adapted to MEDLINE. We used Embase subject headings (Emtree) and Medical Subject Headings (MeSH) along with individual terms to develop the search strategy, with limitations applied (supplementary box 1). We used EndNote (version 9.2; Thomas Reuters, Philadelphia, PA, USA) as citation manager.
We performed the search on 2 June 2020. We used a standardised data extraction form developed a priori in Excel, which was piloted on five randomly selected studies and then refined. Data were recorded for the following: publication details (authors, title, year of publication, country and journal), study characteristics (data collection period, study design, countries that contributed with data, inclusion criteria, clinic type, sample size, population characteristics and diagnostic data), and outcome details (outcomes reported, definitions used, correlation between different outcome measures, equipment used and measurement details).
Two reviewers independently assessed titles and abstracts for eligibility. Full text was obtained for all studies deemed relevant by either reviewer or if there was uncertainty on eligibility. Where disagreements remained after full text review, the manuscripts were discussed with a third person. One reviewer manually searched the reference lists of all eligible studies for additional manuscripts. Three reviewers extracted data for a third of the eligible studies each. A fourth reviewer extracted data from 10% of the total manuscripts included in the study, and their extractions were compared with those extracted by the other three reviewers to ensure consistency.
Inclusion and exclusion criteria
We included studies describing clinical outcome measures in PCD if they 1) had a study population of ⩾10 PCD patients, 2) were published in English, 3) were published after 1996 and 4) were conducted on humans. We did not include studies prior to 1996 because the diagnosis of PCD has changed in the past 20 years, therefore older manuscripts may contain a high proportion of patients that would no longer fulfil the current diagnostic criteria [16]. Details of diagnostic data for each of the included studies were recorded (supplementary table 1).
We excluded conference abstracts, studies that were not original research and those for which full texts were irretrievable. Studies reporting on multiple disease groups were excluded if the PCD data could not be clearly identified. Manuscripts that reported exclusively on ear, nose and throat symptoms were also excluded.
Definition of outcome measures and classification into subgroups
Outcome measures were defined as any clinical measure used to monitor patients over time or as a marker of disease severity. Outcome measures were classified as 1) study outcomes, defined a priori as study outcome measures; or 2) study population descriptors. The latter indicates measures that were used to characterise the study population (e.g. baseline measures of forced expiratory volume in 1 s (FEV1)) and those that could potentially be used in future studies (e.g. cough frequency). The supplementary tables contain detailed information on study characteristics and definitions of outcome measures for all studies included in this review.
Statistical analysis
Critical appraisal of individual studies was not conducted because our focus was on the variety of outcomes used and how these were measured and reported, and not on disease progression, prognosis or treatment effects [17, 18]. Information about scoping reviews is outlined in the supplementary files.
Descriptive and summary data were analysed in R statistical package (version 3.2.3; www.r-project.org). Continuous variables were reported as median and interquartile range (IQR). Categorical variables were reported as proportions. Results were reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Extension for Scoping Reviews checklist [19]. Figures were plotted in R and Tableau 2019 v4.0.
Results
3150 abstracts were identified, of which 2706 were reviewed after exclusion of 444 duplicates. 198 manuscripts were reviewed in full, of which 102 met the inclusion criteria and were therefore included (supplementary figure 1) [12, 20–120].
Study characteristics
The manuscripts included information on 7289 patients with PCD, with a median of 32 PCD patients per manuscript (IQR 20–62, range 10–1609; supplementary table 1). However, some patients were described in several studies and were included more than once if the studies described different outcome measures.
Manuscripts contained data collected from 23 different countries, but most (89%) presented single-centre data. Publications with a higher number of PCD patients were from multicentre studies, with the two largest studies containing data derived from a large international PCD cohort study [121].
Outcome measures reported
93 studies reported on a total of 23 main study outcomes (table 1, figure 1), with 19 presenting data exclusively on population descriptors. 67 presented both study outcomes and population descriptors (table 1, supplementary table 2).
Spirometry-derived parameters were the most frequently reported clinical outcome measures, followed by chest high-resolution computed tomography (HRCT). Microbiology and anthropometric measures were more often reported as descriptors than as study outcomes (figure 1).
Standardised definitions
Definitions of outcome measures varied considerably between studies (supplementary table 2). Of 18 studies reporting on microbiology as study outcomes, 14 provided data on chronic colonisation by potentially pathogenic bacteria [29, 31, 34, 38, 59, 64, 65, 70, 75, 82, 96, 100]. The terms chronic colonisation and chronic infection were sometimes used interchangeably, and the classification used to define them varied. Four studies used the Leeds CF criteria (or a modified version) [38, 64, 65, 70]; two used the European consensus for antibiotic therapy against Pseudomonas aeruginosa in CF [75, 104]; and one used the Copenhagen criteria [100]. The remaining studies developed a study-specific criterion, such as pathogen cultured in ⩾50% of samples from the past year [46, 47] or isolation of the same pathogen on at least two occasions, ⩾3 months apart in a 1-year period [93].
Sampling frequency of sputum and other microbiological specimens varied between studies. Some studies did not record whether patients had a pulmonary exacerbation at the time of sampling. These differences are likely to have biased results, particularly when reporting prevalence of different respiratory pathogens.
Details on the definitions of all clinical outcome measures reported as study outcomes or population descriptors are provided in supplementary table 2. In the next sections, we focus on the most frequently reported outcome measures.
Lung function
Spirometry
Of the 83 studies reporting spirometry data, 42 reported adherence to European Respiratory Society/American Thoracic Society guidelines [122] (supplementary table 2). 25 reported on FEV1 z-scores; 18 as a study outcome. FEV1 % predicted was used as an outcome as often as a descriptor (32 versus 31). Studies reporting on FEV1 z-scores were published more recently (figure 2). 14 studies reported calculating z-scores based on the Global Lung Function Initiative (GLI) reference equations [123], two used national references and eight used other equations, while the remainder did not detail the equation used (supplementary table 2).
Five studies compared FEV1 before and after the use of bronchodilators. Studies used inhaled salbutamol [32, 119] or albuterol [28] to assess reversibility and one study performed methacholine challenge before and after the use of salbutamol and placebo [27]. One study did not report which bronchodilator was used [114].
Body plethysmography
11 studies reported on body plethysmography parameters as study outcomes (table 1, supplementary table 2). Lung residual volume % predicted was reported by all studies; total lung capacity % predicted was reported in four studies; and the remaining 19 parameters were rarely reported. Three of these studies additionally measured forced vital capacity (FVC) and FEV1 using plethysmography devices.
Multiple-breath washout
15 studies reported on parameters derived from the multiple-breath washout (MBW) test as study outcomes (table 1, supplementary table 2), of which 47% were published in the past 2 years. Lung clearance index (LCI) was most frequently reported. Eight studies presented z-scores for LCI, and five presented values for Scond and Sacin z-scores (representatives of ventilation inhomogeneity in small conducting and acinar airways, respectively). Studies used different inert tracer gases and equipment; nine of them used nitrogen (N2), five used 0.2% sulfur hexafluoride and the other study did not report which tracer gas was used.
Chest imaging
40 manuscripts reported on radiological findings, with 26 presenting them as study outcomes and 14 as population descriptors (table 1, supplementary table 2). Of the latter, five studies had spirometry measures as outcomes and provided information on presence or absence of bronchiectasis, diagnosed through chest HRCT/CT or radiography. One study mentioned chest radiography to determine the presence of bronchiectasis; however, the main outcomes were sleep activity and attention deficit scales [85]. The remaining studies did not report on specific outcomes, with data on descriptors only including spirometry, microbiology, anthropometric measurements and fertility.
Four studies reported on both chest radiography and HRCT and two studies on both chest HRCT and magnetic resonance imaging (MRI).
Radiography
Bronchiectasis, seen on chest radiography, was used in one study as study outcome and in six as population descriptor (table 1, supplementary table 2). Similar to other imaging modalities, there are no PCD-specific radiography scoring systems, so studies used different scales to report findings. For example, Jain et al. [54] used a modified version of the Chrispin–Norman score, which was developed for CF [124], while Kennedy et al. [55] developed a study-specific score for bronchiectasis severity.
HRCT
Chest HRCT and/or CT was used as study outcome in 25 studies, with an additional 10 studies reporting it as population descriptor (figure 1, table 1, supplementary table 2). Studies adopted modified versions of different scoring scales as there are no PCD-specific scoring systems available. Seven studies used modifications of the Brody score [125], six used the Bhalla score [126], another four applied the Helbich score [127] and a further six used other systems. Of the latter, four used a study-specific score, one by combining the Brody and Bhalla scores [59]. Two studies did not provide any detail on the scoring system used [37, 56] and therefore were not included in figure 3.
The use of different measurement scales resulted in inconsistent reporting of subscores (figure 3). For example, extent of bronchiectasis was measured by 1) number of bronchopulmonary segments affected; 2) percentage of each lobe involved; 3) scores from 0 to 3; 4) percentage of central lung and peripheral lung involvement; or 5) size of largest and average bronchopulmonary segment involved. Mucus plugging was measured as size of plug (i.e. small, large), location of plug (Brody score: largest airways, small airways, peripheral lung, central lung; or Helbich score: number of segments) or a mucus classification score (Bhalla score).
As illustrated in figure 3, not all studies using the Brody score reported on the same subscore components, probably due to study-specific modifications. For example, one study [12] classified the location of the mucus plug as small or large airways, while five other studies used number of central and peripheral lobes involved. In another study that used a modified combination of the Brody and Bhalla scores, the partition of lungs into different segments followed a regional approach as opposed to the commonly used pulmonary segmentation approach in order to expedite the time needed for scoring each scan in routine clinical practice [59].
Unsurprisingly, all 23 studies that presented information on their scoring system reported on bronchiectasis. Studies classified bronchiectasis as mild for those with airway diameter slightly greater than the diameter of adjacent blood vessel, moderate for airway lumen two or three times the diameter of the vessel and severe for those at least three times the diameter of the vessel. However, the extent of bronchiectasis varied, with some reporting the number of bronchopulmonary segments, while others reported percentage of compromised area (figure 3). The second most common features described were airway wall thickening and mucus plugging (n=19).
Two studies comparing CT scores in PCD and CF patients found no differences in the global Brody score. A third study used a study-specific system to analyse CT scans from patients with PCD and CF and then assess results against the Brody and Bhalla scores [63]. They found that bronchial wall thickening, bronchiectasis, mucus plugging, atelectasis and air trapping, features commonly seen in CF patients, were even more common in patients with PCD.
Maglione et al. [57] reported a significantly higher subscore for severity of collapse or consolidation in PCD compared to CF, and Cohen-Cymberknoh et al. [82] found that the lower and middle lobes were more frequently affected in PCD compared to the typical upper lobes compromise seen in CF [34, 55, 62, 82]. Tadd et al. [63] reported a higher frequency of extensive tree-in-bud pattern of mucus plugging, bronchocoeles or nodules, thickening of interlobar and intralobular septa, and atelectasis or collapse of the whole lobes in PCD; these are uncommon in CF patients.
Magnetic resonance imaging
Only five studies reported on chest MRI as study outcome (table 1, supplementary table 2). Four studies applied a modified Helbich scoring system, while Smith et al. [50] developed a study-specific scoring system for three-dimensional volumetric hyperpolarised MRI. When looking at subscores, Maglione et al. [75] found no significant difference between total MRI scores and subscores in 20 PCD and 20 mild CF patients, aside from a higher score for severity of collapse/consolidation in PCD patients. In a smaller study of 11 PCD children, all presented with mostly small and heterogeneous abnormalities on ventilation MRI [50].
Microbiology
57 studies reported on microbiology; 18 as study outcomes and 39 as population descriptors (table 1, supplementary table 2). Studies reported most commonly on Haemophilus influenzae and P. aeruginosa, followed by Staphylococcus aureus. Some studies distinguished between mucoid and nonmucoid strains of P. aeruginosa, while others simply reported on P. aeruginosa infection. Similarly, S. aureus subtypes were inconsistently stratified across studies, with some reporting methicillin-sensitive and -resistant strains separately. Not all studies stratified pathogen prevalence by age group (supplementary figure 2).
Other outcome measures
Health-related quality of life scores
17 studies reported on health-related quality of life (HRQoL) as study outcomes (table 1). However, only two studies [74, 92] used QOL-PCD, as most were published before the disease-specific tool was validated [71, 128, 129]. The most common instruments adopted were the St George's Respiratory Questionnaire (SGRQ) (eight studies) and the 36-item short form survey (SF-36) (seven studies) (supplementary table 2).
Pulmonary exacerbations
Nine studies reported on pulmonary exacerbations; five as study outcomes. However, none used the PCD-specific consensus, as the studies included in this review pre-date it [130]. Two randomised controlled trials (RCTs) used pulmonary exacerbation as a primary outcome. Paff et al. [89] defined an exacerbation as respiratory symptoms that led to initiation of systemic antibiotic treatment irrespective of culture results, or a decline of ⩾10% in FEV1 % predicted compared to baseline at screening and randomisation [89], while Kobbernagel et al. [92] defined it as worsening of respiratory symptoms leading to initiation of antibiotic treatment in the week prior to the clinical appointment up to the day of the appointment. Ratjen et al. [90] studied a subset of patients who experienced an episode of exacerbation, defined as an increase in lower airway symptoms treated with oral antibiotics. Joensen et al. [100] applied a definition developed for CF studies [61] (supplementary table 2). Sunther et al. [35] only included patients with pulmonary exacerbation, defined as change in respiratory status for which intravenous antibiotics were needed.
Comparison between outcome measures
Most studies comparing outcome measures used spirometry as the reference to which the other outcomes were compared (figure 1). 12 studies describing imaging modalities reported on agreements or correlations with other outcome measures [12, 33, 43, 49, 53, 55, 57, 60, 93, 108, 119]. The most common comparison was between spirometry-derived FEV1 and HRCT, with studies presenting contradictory findings. Four studies [33, 53, 55, 93] found agreement between the two outcomes, one of which used an automated CT scoring for adults with PCD [53]. Three studies used a modified Bhalla system and the other a study-specific scoring system. The other four studies [43, 57, 62, 119] reported no association.
FEV1 was compared to MBW-derived LCI in eight studies, also with contradictory results. Two studies reported no association [41, 44], while the other six [12, 35, 42, 43, 46, 49] found correlations between some parameters. Both Boon et al. [12] and Kobbernagel et al. [46] found a significant negative correlation between LCI, FEV1 and FEV1/FVC ratio z-scores, while Irving et al. [43] only found a correlation between LCI and forced expiratory flow at 25–75% of FVC (FEF25–75) z-scores. Green et al. [42] did not find any correlation between LCI and FEV1 z-scores in PCD patients, but reported a significant correlation between LCI2.5 (LCI at 2.5%, standard LCI) and FEV1/FVC ratio and FEF25–75 z-scores.
MBW-derived LCI might be more sensitive to detect early or milder disease. Nyilas et al. [49] found that more than half of the patients with abnormal LCI values and MRI scores had normal FEV1 z-scores. In another study, five (15%) patients had abnormal LCI, but normal FEV1 z-scores [43]. In addition, LCI was shown to be more sensitive than FEV1 to detect lung structure abnormalities [12]. Boon et al. [12] reported that LCI z-scores were concordant with total CFCT scores (a variant of the Brody score) in 83% of the patients, while Kobbernagel et al. [46] and Irving et al. [43] found no correlation.
Studies comparing HRCT to indices derived from body plethysmography, chest MRI and microbiology found significant correlations. However, these were generally limited to subscores (e.g. bronchiectasis on HRCT and body plethysmography and collapse/consolidation on HRCT and MRI) as opposed to the global score.
Other associations between outcome measures are shown in figure 4.
Randomised controlled trials
Only five of the included studies were RCTs, of which four adopted a crossover design (supplementary table 3).
The efficacy of 6 months azithromycin maintenance therapy in reducing the number of respiratory exacerbations in patients with PCD was assessed in a double-blind, parallel-group placebo-controlled RCT at six European PCD centres [92]. Secondary outcomes included changes in spirometry, body plethysmography, N2 MBW, HRQoL, audiometry, sputum microbiology and inflammatory markers.
The effect of hypertonic saline on HRQoL in PCD adults was investigated in a 28-week double-blind crossover RCT with a washout period of 4 weeks. HRQoL was measured by the SGRQ and Quality of Life Questionnaire – Bronchiectasis [89].
Gokdemir et al. [24] assessed spirometry measurements (FEV1, FVC, peak expiratory flow and FEF25–75) in children with PCD using two different airway clearance methods. Half performed conventional pulmonary rehabilitation for 5 days in hospital followed by a 2-day washout period and then high-frequency chest-wall oscillation for another 5 days at home. However, techniques differed between the settings. Another crossover RCT investigated differences in FEV1 % predicted and in bronchial hyperresponsiveness after the use of salbutamol compared to placebo in children with PCD at both 3 and 6 weeks compared to pre-treatment measurements [27].
Noone et al. [106] reported on mean whole-lung clearance rates of a radionucleotide marker after inhalation of uridine-5′-triphosphate compared to placebo during a series of controlled coughs to induce mucociliary clearance in PCD adolescents and adults.
Discussion
This scoping review identified 23 clinical outcome measures used in PCD research. We found a high degree of heterogeneity in the definitions of outcome measures.
Spirometry and chest HRCT were most frequently reported as study outcomes. Spirometry is widely available, relatively easy to perform and does not require expensive equipment [122, 123]; however, researchers have questioned its appropriateness as a measure to monitor disease progression in PCD [35, 49, 57]. A meta-analysis found that mean FEV1 ranged from 51% predicted to 96% predicted, with high heterogeneity between studies that could not be explained by age or other factors [131]. Studies that did not report on which reference values they used or those that did not provide information on quality control reported lower mean FEV1 values. Clinical status at the time of measurement was rarely reported and therefore could not be included in the meta-regression. The largest study to date investigating lung disease in PCD patients found consistently low FEV1 z-scores in patients with PCD compared to reference data, similar to those seen in CF patients [25].
To our knowledge, no study has investigated the timing of physiotherapy in relation to spirometry, which is a significant limitation as, anecdotally, airway clearance techniques can improve spirometric indices. An ongoing multicentre prospective cohort is investigating variability of lung function in stable PCD patients, adjusting for factors such as timing of inhaled medication and respiratory physiotherapy [132, 133] (https://clinicaltrials.gov/ct2/show/NCT03704896). Another potential source of variability when using spirometry-derived measurements are the adopted reference equations, as variations between the GLI and national reference equations can occur. Evidence from a longitudinal CF cohort highlighted the disparity between reference equations, demonstrating the need for a standardised approach to interpreting spirometric measurements to facilitate appropriate comparisons both within and between centres and countries [134].
Chest HRCT has been proposed as a surrogate outcome measure in the assessment of lung disease. However, there are no validated scoring systems for PCD. All studies included in this review used CF-derived scoring systems [125–127], despite significant pathophysiological differences between the two conditions [9, 135]. Additionally, studies do not report the lung volumes at which the CT scans are obtained, with no details on the standard operating procedures used to record the images.
Location, distribution and frequency of features seen in HRCT scans of patients with PCD differ from those with CF [135]. The weights applied to each feature might not be suitable for PCD as CF-derived scoring systems do not reflect the range and severity of structural changes in PCD. Studies found that extensive tree-in-bud pattern of mucus plugging, bronchocoeles or nodules, thickening of interlobar and interlobular septa and atelectasis mostly seen as collapse of whole lobes were frequently described in PCD, but uncommon in CF [52, 63, 135]. Reporting only the global CT scores might be misleading, as some components of the score might be more relevant to clinical outcome, particularly when using a non-disease-specific score. These findings underscore the need for disease-specific CT scoring systems. Hoang-Thi et al. [53] highlighted the fact that visual scores such as the ones routinely used in the assessment of PCD and CF patients can be highly subjective. In response, they developed an automated CT scoring for adults with PCD, which had moderate to good correlation with FEV1 and FVC.
MRI scans of the chest have historically been considered of limited value due to intrinsic characteristics of the pulmonary tissue, and the presence of physiological motion resulting in poor resolution and motion artefacts. Research has focused on improving techniques to obtain better-quality images [136, 137].
Lack of agreement between spirometry, HRCT, MBW and MRI parameters reported by some studies might reflect variations on measurement and reporting of outcomes. Discrepancies could be explained by different scoring systems for HRCT, differences in tracer gas for MBW, variations in measurements, inability of some of the outcome measures to accurately monitor lung disease progression in PCD or true variability between populations (e.g. underlying genetics, differences in disease severity or treatment). Interpretation of findings was limited by the retrospective nature of most studies. In some cases, there was a significant time lag between measurements performed with the methods that were compared [12], or tests were applied to different subpopulations (e.g. HRCT scans conducted only in the older population with more severe lung disease [43] or conducted at different time points of clinical stability [57]). Contradictory results could also be attributed to variations in study design, inclusion criteria or small sample sizes, resulting in variability due to chance.
Recent studies have focused on MBW, with almost half of them published in the past few years. Nyilas et al. [49] found that LCI was not able to distinguish between reversible and irreversible lung damage, despite being more sensitive than spirometry to detect changes. A limitation of LCI is the long washout time, and therefore test duration, which is particularly problematic for patients with compromised lung capacity and young children. Studies looking at shorter washout periods have shown promising results, with LCI5 providing a good alternative to the more conventional LCI2.5 [39, 41, 109]. However, as Nyilas et al. [49] demonstrated, combining different modalities (e.g. MRI and MBW) can be necessary to accurately capture changes in the lungs of PCD patients.
In terms of microbiological outcomes, studies should present a breakdown of pathogens by age group, since the prevalence of bacterial species changes with age [66, 85]. Studies included in this review were also limited by the lack of a universal panel that could be applied consistently across different centres, particularly when reporting the prevalence of each pathogen isolated. Rogers et al. [67] highlighted that some of dominant genera of bacteria found in the sputum of PCD patients were from those unlikely to be detected without specific growth conditions being present. Variations in the frequency of specimen collection and type of specimen (e.g. expectorated sputum, cough swab, bronchoalveolar lavage) will also probably affect pathogen prevalence.
Small sample sizes were a common limitation in most studies, highlighting the importance of national and international disease registries, large collaborative multicentre studies and standardised definitions that enable pooling of data [121, 138–140]. Few studies included sample-size calculations, hampering the interpretation of statistically insignificant results due to underpowered samples.
The number of larger multicentre studies has increased in recent years, highlighting the important role of PCD networks such as Better Experimental Screening and Treatment for Primary Ciliary Dyskinesia (BESTCILIA), Better Experimental Approaches to Treat Primary Ciliary Dyskinesia (BEAT-PCD) and the Genetic Disorders of Mucociliary Clearance Consortium in advancing collaborative research in the field [132, 133, 141, 142]. Such collaborations were featured in two studies that used data derived from the international PCD cohort [121].
RCTs and prospective cohort studies with long follow-up periods are uncommon in rare diseases due to the small sample sizes available, high costs and limited commercial interest from pharmaceutical industries [5, 143, 144]. As a result, the majority of PCD studies are cross-sectional case–control studies or small cohort studies with limited follow-up. Interventional studies are currently being designed, but will require close international collaboration and data sharing. The success of these and of future trials will depend on the selection of appropriate outcome measures.
Our review was limited by the quality of the information provided in the studies. As our aim was to identify the evidence available and describe definitions for clinical outcome measures used in PCD research, we opted to conduct a scoping review and therefore we did not critically appraise the studies included in this review. We did not perform quantitative analysis, as studies were heterogeneous, impeding a formal meta-analysis. In fact, the aim of this review was to highlight this heterogeneity. Another limitation was that the broad nature of this review impeded us from focusing on any one clinical outcome measure, and therefore systematic reviews with or without meta-analysis are still needed for the more commonly used and promising outcome measures. A separate review to evaluate upper airways clinical outcome measures is underway, and therefore we deliberately excluded these studies from our scoping review. Despite our attempts not to restrict the search to specific outcomes, our review is limited to the clinical outcomes that were included as search terms.
Recommendations
We advocate that outcome measures for use in future prospective trials must fulfil the following criteria: 1) measured across different studies in a standardised manner (e.g. using the standardised PCD data collection tool FOLLOW-PCD [145]); 2) used and reported regularly by a sufficient number of studies; 3) use currently recommended definitions (e.g. z-scores based on GLI recommendations); and 4) be embedded within the current knowledge of PCD pathophysiology and natural history (table 2). This will require consensus statements, which are currently being developed by a BEAT-PCD working group.
Spirometry was the outcome most frequently used for disease monitoring, but there were major problems with standardisation on measuring and reporting FEV1. Large studies are needed to investigate the suitability of spirometry-derived parameters as accurate and sensitive surrogate markers.
Chest HRCT might be a good candidate for longitudinal follow-up of lung disease progression in PCD, particularly modalities using low radiation [146, 147]. However, a disease-specific scoring system must be developed. Agreement between HRCT and other outcomes were limited to subscores as opposed to global score, emphasising the need for PCD-specific scores that consider the distribution, frequency and patterns of lung compromise in this population, and that can be easily applied by clinicians without being unnecessarily time-consuming.
The United States Food and Drug Administration and the European Medicines Agency encourage the inclusion of patient-centred outcome measures. A systematic review on the patient's experience of PCD reported worsening of respiratory symptoms with age, which was also associated with decline in the physical and mental domains [148]. QOL-PCD, a HRQoL instrument, is the only validated disease- and age-specific cross-cultural clinical outcome measure in PCD [71, 128, 129, 149, 150]. QOL-PCD correlated well with the Sino-Nasal Outcome Test (SNOT-20) for upper airway symptoms, the COPD-specific SGRQ for lower airways symptoms and SF-36 for physical functioning, role functioning and mental health [71].
An expert consensus on the definition of pulmonary exacerbations in PCD for children and adults was developed recently [130]. The importance of disease-specific definitions was highlighted by the fact that studies that used exacerbations as an outcome adopted different definitions for pulmonary exacerbations [35, 90, 92]. There is now a need to validate the proposed definition and develop a separate definition for upper airway exacerbations.
Conclusions
This scoping review highlights the variety of outcomes and definitions used in PCD research. It also underscores significant differences in measurement and reporting of outcomes. Validated disease-specific clinical outcome measures are needed to monitor disease progression in PCD in future prospective cohort studies and clinical trials. Appropriate outcomes need to be chosen based on the specific patient groups and the study intervention. New studies should aim to measure and report outcomes using standardised methods to build up the body of evidence needed to meaningfully compare results. New promising outcome measures should also be used, such as MBW-derived LCI and microbiology, to assess and better understand the appropriateness of these for long-term monitoring in PCD.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00320-2021.SUPPLEMENT
Figure S1 00320-2021.figureS1
Figure S2 00320-2021.figureS2
Acknowledgement
This work was presented in poster discussion session at the ERS International Congress on 29 September 2019.
Footnotes
Provenance: Submitted article, peer reviewed.
This article has supplementary material available from openres.ersjournals.com
Author contributions: B. Rubbo, P. Latzin and J.S. Lucas conceived the review; B. Rubbo, J. Thompson and C.L. Jackson performed the literature search; B. Rubbo, C.L. Jackson, E. Dehlink, P. Latzin and J.S. Lucas discussed and agreed on the inclusion and exclusion criteria; B. Rubbo, F. Gahleitner, J. Thompson and C.L. Jackson screened publications for eligibility; B. Rubbo, F. Gahleitner, J. Thompson, C.L. Jackson and A.P.L. Queiroz extracted the data; B. Rubbo, F. Gahleitner and J.F. Hueppe analysed the data; B. Rubbo and F. Halbeisen plotted the figures; B. Rubbo drafted the manuscript; F. Gahleitner, J. Thompson, C.L. Jackson, L. Behan, E. Dehlink, M. Goutaki, F. Halbeisen, G. Thouvenin, C. Kuehni, P. Latzin and J.S. Lucas critically reviewed the manuscript. All authors reviewed and approved the final manuscript.
Conflict of interest: M. Goutaki and C.E. Kuehni report that the BEAT-PCD COST Action supported their travel to and attendance of meetings during the period 2015–2019. P. Latzin reports personal fees from Vertex, Novartis, Roche, Polyphor, Vifor, Gilead, Schwabe, Zambon and Santhera, and grants from Vertex, outside the submitted work. J.S. Lucas reports that on 2 December 2019 she attended the Rare Lung and Airway Diseases on Childhood meeting at the Primary Ciliary Dyskinesia and Neonatal Interstitial Lung Diseases Conference in Jerusalem, Israel, to deliver an invited lecture on monitoring disease progression in PCD; her local travel and accommodation were arranged by the conference organisers, and her international travel was supported by BEAT-PCD. All other authors have nothing to disclose.
Support statement: B. Rubbo, J. Thompson, C.L. Jackson, J.F. Hueppe, L. Behan, E. Dehlink, M. Goutaki, F. Halbeisen, G. Thouvenin, C. Kuehni, P. Latzin and J.S. Lucas participate in the Better Evidence to Advance Therapeutic Options for PCD (BEAT-PCD) network (COST action BM 1407 and ERS Clinical Research Collaboration). B. Rubbo received a short-term scientific mission, provided through BEAT-PCD funding, to travel to Switzerland to discuss the review with M. Goutaki, F. Halbeisen and C. Kuehni.
- Received May 7, 2021.
- Accepted July 31, 2021.
- Copyright ©The authors 2021
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org