Abstract
Background Phenotypes can be utilised in the clinical management of disorders. Approaches to phenotype disorders have evolved from subjective expert opinion to data-driven methodologies. A previous cluster analysis among working-age subjects with cough revealed a phenotype TBQ (triggers, background disorders, quality-of-life impairment), which included 38% of the subjects with cough. The present study was carried out to validate this phenotyping among elderly, retired subjects with cough.
Methods This was an observational cross-sectional study conducted via email among the members of the Finnish Pensioners’ Federation (n=26 205, 23.6% responded). The analysis included 1109 subjects with current cough (mean±sd age 72.9±5.3 years; 67.7% female). All filled in a comprehensive 86-item questionnaire including the Leicester Cough Questionnaire. Phenotypes were identified utilising k-means partitional clustering.
Results Two clusters were identified. Cluster A included 75.2% of the subjects and cluster B 24.8% of the subjects. The three most important variables to separate the clusters were the number of cough triggers (mean±sd 2.47±2.34 versus 7.08±3.16, respectively; p<0.001), Leicester Cough Questionnaire physical domain (5.38±0.68 versus 4.21±0.81, respectively; p<0.001) and the number of cough background disorders (0.82±0.78 versus 1.99±0.89 respectively; p<0.001).
Conclusion The phenotype TBQ could be identified also among elderly, retired subjects with cough, thus validating the previous phenotyping among working-age subjects. The main underlying pathophysiological feature separating the phenotype TBQ from the common cough phenotype is probably hypersensitivity of the cough reflex arc.
Abstract
The previously described cough phenotype TBQ (triggers, background disorders, quality-of-life impairment) and the common cough phenotype can be identified in elderly, retired subjects with current cough https://bit.ly/3cygezO
Introduction
Phenotype indicates a single or combination of disease attributes that describe differences between individuals [1]. Recently, approaches to phenotype disorders have evolved from subjective expert opinion to data-driven methodologies like clustering [2]. These methods explore the data through an unsupervised separation of a dataset with little or no ground truth, into a discrete set of hidden data structures [3]. This contrasts with the traditional methods based on human observation and testing of hypotheses using prior knowledge.
Cough is usually classified according to the length of the episode: acute (<3 weeks), subacute (3–8 weeks) or chronic (>8 weeks) [4–12]. However, this classification is one-dimensional and not based on data-driven analyses. We recently performed a cluster analysis in 975 working-age subjects with cough [13]. Two clusters were identified. The smaller one was especially characterised by several triggers of cough, many cough background disorders, and poor cough-related quality of life (TBQ). Those with the phenotype TBQ showed a high tendency for cough prolongation in the follow-up survey 12 months later [13]. The present study was carried out to validate that analysis in a different population, namely in aged, retired subjects, and under different conditions, namely in the middle of the coronavirus disease 2019 (COVID-19) pandemic.
Material and methods
Study design, setting and population
This was an observational, cross-sectional study conducted via email among the members of the Finnish Pensioners’ Federation. The sample size assessment was based on knowledge about the required sample size for cluster analysis, the response rates in email studies and the prevalence of cough in Finland [14–16]. The 26 205 members (mean age 72.7 years, 63.5% females) who had an email address were sent an invitation to participate along with information about the study. A pre-notification email was sent. The electronic questionnaires were sent as a hyperlink in an email in April 2021 and one reminder was sent 2 weeks later. The responses were recorded in an electronic datasheet.
The study was approved by the ethics committee of Kuopio University Hospital (289/2015; Kuopio, Finland). Permission to conduct the study was obtained from the Finnish Pensioners’ Federation. The decision of the subject to reply was considered as an informed consent.
The questionnaire
The questionnaire was almost identical with the one used in our previous cluster analysis [13]. The 86 questions dealt with social background, lifestyle, general health, doctors’ diagnoses and visits, and medications. Appropriate symptom questions for asthma, chronic rhinosinusitis, gastro-oesophageal reflux disease (GORD), obstructive sleep apnoea (OSA) and depressive symptoms were included [17–21]. Respondents with current cough were asked to answer additional cough-related questions, including the Leicester Cough Questionnaire (LCQ). For the present study, new questions were added about symptoms of OSA, COVID-19 infection and vaccination, symptoms of flu at the beginning of the current cough episode and recurrence of cough episodes. The list of potential cough triggers was completed by speaking, laughing and deep inspiration. An English version of the questionnaire can be found in the supplementary material.
Definitions of variables derived from the raw data
Acute, subacute and chronic cough were defined as suggested in international guidelines [4–12]. Current asthma was defined as doctor's diagnosis of asthma at any age and wheezing during the past 12 months [17]. Chronic rhinosinusitis was defined as either nasal blockage or nasal discharge (anterior or posterior nasal drip) and either facial pain/pressure or reduction/loss of smell for >3 months [18]. GORD was defined as heartburn and/or regurgitation on ≥1 day per week during the past 3 months [19]. OSA was defined as presence of two or more of the following features: arterial hypertension, loud snoring, daytime somnolence or observed apnoeas [20]. These disorders, in addition to doctor's diagnoses of bronchiectasis and pulmonary fibrosis, were defined as cough background disorders. The number of cough background disorders was calculated by summing them up, giving a value from zero to six. Unexplained cough was defined as absence of any of these disorders. Autoimmune disorder was defined as presence of a doctor's diagnosis of hypothyroidism, rheumatoid arthritis or other autoimmune disorders. Presence of depressive symptoms was defined as a Patient Health Questionnaire-2 score of three or more [21]. Symptom sum was calculated by summing up all reported symptoms except those associated with airway disorders, giving a value from zero to 15. Trigger sum was calculated by summing up all reported cough triggers, giving a value from zero to 15. Allergy was defined as a self-reported allergy to pollens, animals or food. A family history of chronic cough was defined as the presence (now or in the past) of chronic (duration >8 weeks) cough in parents, sisters or brothers.
Statistical analysis
All questions of the questionnaire plus the derived variables were included in partitional clustering using the k-means method [3] similarly to our previous cluster analysis [13]. Dimension reduction and cluster analysis steps were performed using R statistical software (version 3.5.1; R Foundation for Statistical Computing, Vienna, Austria) with diffusionMap, NbClust and cluster packages.
First, data were pre-processed to transform all variables to the same scale (0–1). Right-skewed (skewness >1) variables were normalised with the log(x+1) function, since zero values cannot be log-transformed. Ordinal and continuous variables were scaled into 0–1 intervals. Each variable's minimum value or the lowest class was assigned a value of 0 and the maximum value or the highest class was assigned a value of 1. Binary variables remained unchanged. Value 0 indicated the negative or “no” alternative and value 1 the positive or “yes” alternative.
Second, clustering was applied. A distance matrix between observations with scaled variables were calculated using the Manhattan distance function. Diffusion maps dimension reduction algorithm was applied to extract diffusion coordinates from distance matrix, using default settings of the software. The number of clusters was evaluated by the 24 criteria provided by the software. After that, the extracted diffusion map coordinates were clustered into groups using the k-means method. To validate the clustering, the analyses were repeated by excluding those background variables with no plausible biological association with cough (such as hometown, years of education, alcohol consumption, etc.).
Third, cluster membership was added to original data to compare the clusters, utilising the Mann–Whitney U-test or Chi-squared test. The interrelationships of the variables were analysed using the Spearman's correlation coefficient (rs) using SPSS software (SPSS Statistics for Windows, version 22.0; IBM, Armonk, NY, USA). Receiver operating characteristic curves (ROC) and the Youden index were utilised to define the cut-off values. The values are expressed by either mean±sd, median (interquartile range) or percentages. A p-value <0.05 was accepted as the level of statistical significance.
Results
23.6% of the subjects responded (6189 respondents, mean±sd age 72.2±5.5 years, 66.4% female) (figure 1). 206 respondents were excluded from the analyses because of age <64 years. Of the remaining 5983, 1109 subjects suffered from current cough. They formed the population in which the clustering was applied. Their mean age was 72.9±5.3 years and 67.7% were female. The proportion of missing values was <2.5%, except for the two OSA-related questions (3.1–3.7%).
Nine of the criteria provided by the R statistical software suggested two as the best number of clusters, five suggested three clusters, three suggested five clusters, one criterion suggested seven, eight, nine, 11 or 14 clusters, and two suggested 15 clusters. Therefore, the extracted diffusion map coordinates were clustered into two groups, called cluster A (834 subjects, 75.2%) and cluster B (275 subjects, 24.8%).
The distribution of the clusters in acute, subacute, and chronic cough is presented in table 1. Cluster B was represented in all subtypes, although its proportion was largest in chronic cough.
Table 2 presents those 10 variables that most strongly separated the clusters, according to the p-value between the clusters, as well as 23 other variables of interest. Cluster B was especially characterised by several cough triggers, many cough background disorders, and low LCQ scores (figures 2 and 3). Of the various cough triggers, paints and fumes most strongly separated the clusters. Of the cough background disorders, asthma most strongly separated the clusters. Of the three LCQ domains, the physical domain most strongly separated the clusters.
Table 3 presents the best cut-off values for the 10 most important variables to identify cluster B and their sensitivity, specificity and area under the ROC values. After that, a ROC curve was constructed to evaluate the best number of the main determinants (trigger sum ≥5, LCQ physical domain ≤4.9, at least one cough background disorder) to separate the clusters (figure 4). The presence of at least two main determinants gave the best Youden index with a sensitivity of 0.96 and specificity of 0.72. The presence of all three main determinants gave a sensitivity of 0.61 and specificity of 0.97.
Belonging to cluster B increased the likelihood of at least one doctor's visit due to cough in the past 12 months (OR 3.39, 95% CI 2.53–4.55) and the likelihood of having used cough medicines in the past 12 months (OR 1.88, 95% CI 1.41–2.50). The population was also divided according to the length of the cough episode. Presence of chronic cough (>8 weeks’ duration) slightly increased the likelihood of doctors’ visits (OR 1.91, 95% CI 1.36–2.70), but decreased the likelihood of using cough medicines (OR 0.66, 95% CI 0.50–0.89).
The validation analysis, excluding those background variables with no plausible biological association with cough, gave almost identical results. The five most important variables in that analysis were trigger sum, LCQ physical domain, number of cough background disorders, LCQ question 9 (paints or fumes as a cough trigger) and dyspnoea with wheezing (data not shown).
There were significant interrelationships between the most important variables: the number of cough triggers was associated with the number of cough background disorders and the LCQ physical domain (rs=0.28, p<0.001 and rs= −0.34, p<0.001, respectively), and the number of cough background disorders was associated with the LCQ physical domain (rs= −0.40, p<0.001).
Discussion
This clustering, which was performed during the COVID-19 pandemic among 1109 elderly, retired subjects with current cough, validates our previous clustering among working-age, employed subjects [13]. Again, two clusters were found. Cluster B, consisting of 24.8% of the subjects, was especially characterised by several cough triggers, many cough background disorders, and poor cough-related quality of life. These features fit the cough phenotype TBQ, which was identified in our previous study. Cluster A, lacking these features, may be called the “common” cough phenotype.
Clustering is a task of grouping subjects in such a way that subjects in the same group (cluster) are more like each other than those in other groups. This is achieved by measuring the distances between the subjects with respect to each variable. These distances are summed and placed on a two-dimensional table representing every possible pair of subjects, called the distance matrix. Different algorithms can then be applied to recognise the clusters of subjects with small distances to each other. Dimension reduction is often necessary to improve the observation to variable ratios, which makes the analysis more reliable.
Both cluster analyses identified the number of cough triggers as the most important variable to separate the phenotypes. Both analyses also identified chemical triggers such as paints, fumes and strong scents as the most important types of triggers [13]. Several studies have shown that subjects with lower airway symptoms induced by chemical irritants are especially sensitive to the cough-provocation test with capsaicin [22–31]. Therefore, we hypothesise that the main underlying pathophysiological feature separating the phenotype TBQ from the common phenotype is hypersensitivity of the cough reflex. Thus, this phenotype might also represent a distinct endotype. Subjects with TBQ phenotype might especially benefit from medication that can decrease the sensitivity of the cough reflex. If that is the case, the phenotype TBQ could also provide a treatable trait. To investigate whether the phenotype TBQ is a distinct cough genotype, studies applying genome analyses should be performed. Of note, family history of chronic cough was far more common in the phenotype TBQ than in the common phenotype.
The common cough phenotype seems to be less associated with the cough reflex hypersensitivity than the phenotype TBQ. Other mechanisms, like excessive mucus production, may be more important in that phenotype [32].
The phenotype TBQ was more strongly associated with doctor's visits due to cough and the use of cough medicines than the presence of chronic cough. Furthermore, our previous study showed a strong tendency for cough prolongation in the phenotype TBQ [13]. Since the phenotype TBQ is related to clinically meaningful outcomes, it fulfils the criteria for clinical phenotype [1]. Its identification may serve as an indication for prompt and comprehensive clinical evaluation regardless of the duration of the cough episode.
The phenotype TBQ resembles the concept “cough hypersensitivity syndrome”, introduced by experienced clinicians [33]. Both emphasise the enhanced response to cough triggers. However, there seems to be two major discrepancies between the entities. First, cough hypersensitivity syndrome has been connected to chronic cough [33], but the present study shows that features of cough hypersensitivity can be present in acute and subacute cough as well. Second, it has been postulated that the cough hypersensitivity syndrome is present in the majority of subjects with chronic cough [33–35], whereas in the present analysis just 27.4% of subjects with chronic cough showed the features of the phenotype TBQ. These discrepancies may be best explained by the fact that cough hypersensitivity syndrome has been described among subjects attending special cough clinics [33], whereas our cluster analyses are based on community-based populations. Given the documented high tendency of the subjects with the phenotype TBQ to seek medical attention, they are probably overrepresented in the population attending special cough clinics. Despite the aforementioned differences, it is remarkable that unsupervised, data-driven analyses lead to similar conclusions to those drawn from clinical experience.
A recent study from Australia supports the present analysis. In that study, two clusters could be identified among subjects with various respiratory symptoms. The smaller cluster was characterised by symptoms of laryngeal hypersensitivity and a strong cough response to mannitol [36]. The characteristics of that cluster resemble those of the phenotype TBQ.
For clinical purposes, we calculated the best cut-off values for the most important variables to separate the clusters. They were almost identical to those reported in our previous cluster analysis [13]. Presence of at least two of the three main TBQ determinants gave the largest sum of sensitivity+specificity and is thus the most suitable clinical criterion for the cough phenotype TBQ. Reliable clinical demonstration of the phenotype TBQ requires that a comprehensive list of triggers is presented to the patient in a written form, since patients often forget some triggers when asked openly. The list of the 15 triggers asked in the present study can be found in the questionnaire (supplementary material).
There were slight differences in the questionnaires between the present and the previous study [13], which did not affect the main results. It has been shown that there are more cough background disorders in elderly than in younger subjects [37–39]. Therefore, bronchiectasis, pulmonary fibrosis and OSA were added to the variable “number of cough background disorders”. Questions about COVID-19 infection and vaccination, and symptoms of flu at the beginning of the current cough episode were added due to the current pandemic. We also asked how many cough episodes the subject had had in the past 12 months. This number was significantly higher in the phenotype TBQ than in the common phenotype. Of note, the number of the cough episodes was more strongly associated with the phenotype TBQ than the length of the current cough episode.
The present study has several limitations. The participation rate was relatively low, which is typical for academic email surveys [15] and of similar magnitude to our previous cluster analysis [13]. However, the age and gender distribution of the responders did not differ significantly from the original population. The high age of the population may have hindered the use of email in some individuals. It is possible that patients with severe cough have been more willing to participate than patients with mild cough. This may have led to an overrepresentation of the TBQ phenotype. The proportion of current smokers was small, which may have reduced the impact of smoking on the analysis. The prevalence of short, infection-associated cough subtypes was low in the present population, probably due to personal protective and social measures that were recommended during the pandemic era [39]. Finally, the analysis was based on the questionnaire data only; spirometry, laboratory and radiography data were missing.
The strengths of the present study included a large population, which was missing in our previous cluster analysis: elderly, retired subjects [13]. The survey was not limited to cough patients, but to a community-based population. Therefore, even those coughing subjects who would never complain about their cough to doctors were also included. The questionnaire was originally planned and further completed to investigate cough and associated conditions. It included a comprehensive list of cough triggers with both external triggers and those representing allotussia [40]. The interindividual variation in how subjects recognise and report symptoms was controlled by the variable “symptom sum”. Furthermore, the symptom questions to define important background disorders were those recommended for epidemiological studies. All raw data plus the derived variables were included in the cluster analysis without hypotheses using prior knowledge. In this way, all relevant features of cough and even undiagnosed but symptomatic background disorders were equally considered in the analysis.
Conclusions
The phenotypes TBQ and the common phenotype could be identified among elderly, retired subjects with cough, thus validating the previous phenotyping among working-age subjects [13]. The phenotype TBQ was associated with frequent doctor's visits due to cough, use of cough medicines and a high tendency for cough prolongation [1, 13]. Clinical evaluation of a patient with cough should probably focus on the presence of cough triggers, background disorders and the quality of life rather than on the length of the cough episode.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00284-2022.SUPPLEMENT
Acknowledgements
We thank Seppo Hartikainen (Istekki Oy, Kuopio, Finland) for his assistance in creating the electronic questionnaire.
Footnotes
Provenance: Submitted article, peer reviewed.
Support statement: The study was supported by Kuopion Seudun Hengityssäätiö and Hengityssairauksien Tutkimussäätiö foundations. The funders had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication. Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: H.O. Koskela received support for the present manuscript from Kuopion Seudun Hengityssäätiö and Hengityssairauksien tutkimussäätiö; outside the submitted, work the following relationships have been reported: payment for lectures received from Boehringer Ingelheim Ltd and MSD Ltd, and stock owned in Orion Ltd. J.T. Kaulamo received support for the present manuscript from Foundation of the Finnish Anti-Tuberculosis Association/Suomen Tuberkuloosin vastustamisyhdistyksen Säätiö, Respiratory Foundation of Kuopio Region/Kuopion Seudun Hengityssäätiö, Research Foundation of the Pulmonary diseases/Hengityssairauksien tutkimussäätiö, and Väinö and Laina Kivi Foundation/Väinö ja Laina Kiven Säätiö; outside the submitted work, the following relationships have been reported: support for attending a scientific meeting received from Boehringer Ingelheim. T.A. Selander has nothing to disclose. A.M. Lätti received support for the present manuscript from Kuopion Seudun Hengityssäätiö, Hengityssairauksien tutkimussäätiö, KYS:n tutkimussäätiö, Suomen Tuberkuloosin Vastustamisyhdistyksen säätiö sr and Väinö ja Laina Kiven Säätiö; outside the submitted work, the following relationships have been reported: payment for lectures received from Farmasian oppimiskeskus/Pharmaceutical Learning Centre, payment for lectures received from GlaxoSmithKline, payment for group input meeting received from MSD, and support for attending scientific meetings received from Orion Oyj, Boehringer Ingelheim and Roche.
- Received June 9, 2022.
- Accepted August 23, 2022.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org