Persistent COVID-19 symptoms in a community study of 606,434 people in England

Whitaker, Matthew; Elliott, Joshua; Chadeau-Hyam, Marc; Riley, Steven; Darzi, Ara; Cooke, Graham; Ward, Helen; Elliott, Paul

doi:10.1038/s41467-022-29521-z

Download PDF

Article
Open access
Published: 12 April 2022

Persistent COVID-19 symptoms in a community study of 606,434 people in England

Nature Communications volume 13, Article number: 1957 (2022) Cite this article

23k Accesses
176 Citations
822 Altmetric
Metrics details

Subjects

Abstract

Long COVID remains a broadly defined syndrome, with estimates of prevalence and duration varying widely. We use data from rounds 3–5 of the REACT-2 study (n = 508,707; September 2020 – February 2021), a representative community survey of adults in England, and replication data from round 6 (n = 97,717; May 2021) to estimate the prevalence and identify predictors of persistent symptoms lasting 12 weeks or more; and unsupervised learning to cluster individuals by reported symptoms. At 12 weeks in rounds 3–5, 37.7% experienced at least one symptom, falling to 21.6% in round 6. Female sex, increasing age, obesity, smoking, vaping, hospitalisation with COVID-19, deprivation, and being a healthcare worker are associated with higher probability of persistent symptoms in rounds 3–5, and Asian ethnicity with lower probability. Clustering analysis identifies a subset of participants with predominantly respiratory symptoms. Managing the long-term sequelae of COVID-19 will remain a major challenge for affected individuals and their families and for health services.

Distinct clinical symptom patterns in patients hospitalised with COVID-19 in an analysis of 59,011 patients in the ISARIC-4C study

Article Open access 27 April 2022

Jonathan E. Millar, Lucile Neyton, … ISARIC-4C

Long COVID burden and risk factors in 10 UK longitudinal studies and electronic health records

Article Open access 28 June 2022

Ellen J. Thompson, Dylan M. Williams, … Claire J. Steves

Unsupervised machine learning to investigate trajectory patterns of COVID-19 symptoms and physical activity measured via the MyHeart Counts App and smart devices

Article Open access 22 December 2023

Varsha Gupta, Sokratis Kariotis, … Allan Lawrie

Introduction

The UK has experienced one of the largest epidemics of COVID-19 in Europe. As a new disease, the natural history beyond the immediate illness and the possible long-term sequelae remain largely unknown. As well as the acute risk of hospitalisation and death from COVID-19, some people who develop symptoms have a prolonged and debilitating illness that may continue for weeks or months^1,2,3,4,5. This has been called post-COVID syndrome⁶ or Long COVID, a term first coined by people sharing their experience of ongoing symptoms on social media and establishing support groups⁷.

The frequency, nature and duration of persistent symptoms from COVID-19 are poorly understood and represent a major knowledge gap if effective treatments and management strategies are to be developed. Reported symptoms include severe fatigue, breathlessness, chest pain or heaviness, fever, palpitations, cognitive impairment (‘brain fog’), loss of sense of smell (anosmia), loss of sense of taste (ageusia), skin rash and joint pain or swelling^1,2,3,4,5. Estimates of symptom prevalence and persistence vary substantially, arguably due to heterogeneous study designs and syndrome definitions^8,9,10,11. It has been suggested that Long COVID describes a group of disparate conditions, including post-viral syndromes, long-term tissue or organ damage and ongoing inflammation^3,9,12,13.

Occurrence of Long COVID appears to be associated with the severity of COVID-19; for example, high prevalence of persistent symptoms has been reported among people hospitalised with COVID-19^14,15,16. The number of acute symptoms has also been associated with risk of Long COVID, alongside older age and female sex⁸.

While many Long COVID studies so far have focused on hospitalised COVID-19 cases^{14,15,16,17,18,19}, here we report data from random community-based samples of the population in England. These involved more than 600,000 people who took part in rounds three to five (main analysis) and round six (replication) of the Real-Time Assessment of Community Transmission-2 (REACT-2) study between September 2020 and May 2021. Among participants reporting symptoms lasting 12 weeks or more following suspected or confirmed COVID-19, we estimate symptom prevalence, investigate co-occurrence of symptoms and assess risk factors for persistence of symptoms.

Results

Figure 1 shows the study design and population. A total of 508,707 people took part in REACT-2 rounds 3–5, and 97,727 in REACT-2 round 6 (excluding a ‘booster’ sample of additional people recruited at ages 55 years and over), with response rates of 29.4% and 29.9% respectively. Compared to responders, non-responders were more likely to be men, younger (18–24 years) or older (>75 years) adults and live in more deprived areas (Supplementary Table 1).

**Fig. 1: Study population flow chart.**

A total of 92,116 respondents reported previous COVID-19 in rounds 3–5, and 14,562 in round 6, giving a weighted prevalence of 19.2% (19.1,19.3) and 17.9% (17.7,18.0) respectively.

Prevalence of persistent symptoms

Table 1 shows the proportion of people with COVID-19 who still reported one or more, or three or more, of 29 symptoms at 12 weeks after symptom onset. At 12 weeks, 37.7% (37.4,38.1) of those in rounds 3–5 reported one or more symptoms, and 17.5% (17.2,17.7) reported three or more; in round 6, these figures were 21.6% (20.9,22.3) and 11.9% (11.4,12.5), respectively. For rounds 3–5, these translated to a weighted population prevalence of 5.80% (5.73,5.86) for having, or having had, one or more persistent symptoms for 12 weeks or more, and 2.23% (2.19,2.27) for three or more persistent symptoms. In round 6 the equivalent percentages were 3.06% (2.98,3.14) and 1.61 (1.56,1.67), respectively, for 27 symptoms in common with rounds 3–5 (Supplementary Table 5), increasing to 3.26% (3.18,3.34) and 1.86% (1.80,1.92) for one and three symptoms respectively if all 35 symptoms surveyed in round 6 are included (Supplementary Table 6).

Table 1 Proportions of respondents in (i) rounds 3–5 and (ii) round 6 who still reported one or more (or three or more) symptoms 12 weeks after initial symptom onset.

Full size table

Figure 2 shows the proportion of people with one or multiple symptoms over time since symptom onset. There was a rapid drop-off in symptom reporting by 4 weeks, a further, smaller drop by 12 weeks, but then limited further decline up to ~22 weeks for both men and women, with higher prevalence of symptoms at each time point among women.

**Fig. 2: Persistence of symptoms over time.**

In rounds 3–5, the most prevalent persistent symptom was tiredness at 16.8% (16.5,17.1), whereas in round 6 reporting of tiredness was much lower at 8.0% (7.5,8.6) (Fig. 3, Supplementary Table 2). Smaller declines in prevalence from rounds 3–5 to round 6 were observed for 16 of the other 26 symptoms that were common to all four rounds, while increases were observed for four symptoms (Fig. 3).

**Fig. 3: Symptom prevalence in September 2020–February 2021, and in May 2021.**

Risk factors for persistent symptoms at 12 weeks

Prevalence by sociodemographic and lifestyle factors is shown in Supplementary Tables 3–8. To test the independent effects of these factors on risk of persistent symptoms, we carried out age- and sex- and mutually adjusted logistic regression as well as multivariable analysis for variable selection. In rounds 3–5, the persistence of one or more symptoms for 12 weeks or more was associated with female sex, increasing age, being overweight or obese, smoking, vaping, hospitalisation with COVID-19, deprivation, low household income, and healthcare or care home workers, with odds ratios ranging from 1.38 (1.32,1.45) for female sex to 3.45 (2.57,4.64) for hospitalisation with COVID-19 (Fig. 4, Supplementary Table 9). Asian ethnicity was associated with lower risk of persistent symptoms compared to people of white ethnicity (OR: 0.84 [0.74,0.96]). In multivariable analysis for variable selection and ranking, the strongest predictors of persistent symptoms, in order, were age, sex, body mass index (BMI), household income, healthcare/care home worker, deprivation, smoking status, prior hospitalisation with COVID-19 and vaping status.

**Fig. 4: Modelling of persistent symptoms as a function of biological and demographic variables.**

In generalised additive models (GAMs) with likelihood of symptom persistence at 12 weeks or more modelled as a smoothed function of sex and age, risk of persistent symptoms increased linearly with age in both men and women with an additional 3.5 percentage points of risk per decade of life. Women had ~8 percentage points higher risk than men at all ages (Fig. 4, Supplementary Fig. 1).

The results of the logistic modelling and variable selection in the replication data set (REACT-2 round 6, from May 2021), were similar (Supplementary Fig. 2), except that smoking, vaping and deprivation were not associated with persistent symptoms in multiple logistic regression, and in multivariable variable selection analysis, healthcare/care home worker status, gross household income and deprivation were not selected, while Asian ethnicity was. Statistical power for these analyses was lower, however, given the smaller sample size in round 6 compared with rounds 3–5.

Clustering analysis

In clustering analysis of the 20,240 participants in rounds 3–5 who were still symptomatic 12 weeks after initial symptom onset, two stable clusters of participants were identified based on symptom profiles at 12 weeks (Fig. 5, Supplementary Fig. 3). In bootstrap stability analysis, the clusters were recovered in 100% of stability bootstraps. There was high prevalence of persistent tiredness in Cluster L1 (n = 15,799), which co-occurred with muscle aches, difficulty sleeping and shortness of breath (Supplementary Fig. 4). Cluster L2 (n = 4441) had high prevalence of respiratory symptoms including shortness of breath and tight chest, as well as chest pain (Figs. 5, 6, Supplementary Fig. 4). The cluster medoids—the representative observations at the centre of each cluster—were a participant with only tiredness at 12 weeks (Cluster L1) and a participant with shortness of breath and tight chest at 12 weeks (Cluster L2) (Fig. 5, Supplementary Fig. 5). A higher proportion of people in the respiratory cluster (Cluster L2) reported severe symptoms at the time of their COVID-19 illness at 43.5% (42.0,44.9) than in Cluster L1 at 27.4% (26.7,28.1). Rates of hospitalisation were nearly three times as high in Cluster L2 (2.9% [2.5–3.5]) as in Cluster L1 (1.1% [0.9,1.3]) (Fig. 5, Supplementary Table 10).

**Fig. 5: Results of clustering on symptom profile at 12 weeks.**

**Fig. 6: Persistence of individual symptoms, by symptom cluster.**

In the replication data from round 6, clustering analysis again identified a subset of respondents (Cluster _R6L_2; n = 1582) with high prevalence of shortness of breath, co-occurring with tight chest/chest pain, but also with high prevalence of tiredness, while another cluster (Cluster _R6L_1; n = 1263) had high prevalence of loss of sense of smell and taste (Supplementary Fig. 6).

Sensitivity analyses

In sensitivity analyses (rounds 3–5), we assessed the impact of (i) use of a reduced set of 15 symptoms associated with self-reported PCR-positivity (Supplementary Fig. 7), and (ii) restricting the study population to those who self-reported previous COVID-19 and tested positive on a lateral flow immunoassay (LFIA). Using 15 instead of 29 symptoms reduced the prevalence of persistent symptoms at 12 weeks by only four percentage points to 32.9% (32.6,33.2) and identified the same set of risk factors (Supplementary Fig. 8), while in the LFIA-positive subgroup the prevalence of persistent symptoms at 12 weeks was increased at 42.4% (41.6,43.2) (Supplementary Tables 3, 4, 11).

In clustering sensitivity analyses on round 3–5 data, the additional clustering methods (PAM using Dice distance and Jaccard distance) identified 5 and 6 clusters, respectively. In each case, two clusters with primarily respiratory symptoms were identified, which contained almost all the observations from the ‘respiratory’ cluster L2 in the main analysis (Supplementary Fig. 9). Across all methods and possible numbers of clusters, silhouette width was maximised when using two clusters with Hamming distance (presented in main analysis).

Latent class analysis identified one smaller class characterised by respiratory symptoms and higher overall symptom prevalence, and one larger class characterised predominantly by tiredness (Supplementary Fig. 10).

Background symptom prevalence

We used REACT-1 data to estimate the background level of symptom reporting in PCR-negative adults (Methods). Among 1,879,842 PCR-negative adults during REACT-1 rounds 2–14 (June 2020–September 2021), average weighted prevalence of any of 26 symptoms lasting 11 or more days was 3.06% (3.04,3.09) (Supplementary Fig. 11).

We also used REACT-1 data to investigate potential differential recall bias by age. The proportion of PCR-positive individuals who reported any symptoms at time of infection was lower in the older age groups, consistent with the REACT-2 findings (Supplementary Table 12).

Discussion

In this large community-based study of symptoms following COVID-19 among adults aged 18 years and above in England, participants reported high prevalence of persistent symptoms lasting 12 weeks or more. Estimates ranged from 5.8% of the adult population experiencing, or having experienced, one or more persistent symptoms post-COVID-19 (corresponding to over 2 million adults in England), to 2.2% for three or more persistent symptoms (just under a million adults in England) in rounds 3–5, and 3.1% and 1.6% for one and three persistent symptoms respectively in round 6.

Our estimates of the proportion of people with persistent COVID-19 symptoms are higher than in some other studies, although previous estimates have varied widely. At the low end, one study found that 2.3% of people with COVID-19 still reported symptoms at 12 weeks⁸; other studies have reported 13.7% of people were symptomatic at 84 days⁹, 14.8% symptomatic at 90 days¹⁰, 27% at 60 days²⁰, 35% at 2 months²¹, 34.7% at 7 months⁶, 46% at 6 months¹¹, and as high as 51–52% at 6 months^16,22. Our estimates, that 37.7% of people with COVID-19 experience one or more symptoms at 12 weeks in autumn/winter 2020–2021, and 21.6% in spring 2021, may partly reflect the large list of symptoms we surveyed, many of which are common and not specific to COVID-19. However, the estimated background prevalence of persistent symptoms for 11+ days in more than 1.8 million PCR-negative REACT-1 respondents was ~3%, which provides an upper bound for non-COVID-19-related prevalence of persistent symptoms at 12 weeks or more. Our estimate of the prevalence of COVID-19-related persistent symptoms is therefore approximately tenfold the background prevalence. This is in agreement with a study of 26,922 UK residents between April and August 2021 by the Office for National Statistics (ONS), who estimated the point prevalence of any of 12 symptoms at 3.4% in non-COVID-19-positive people, with 0.5% reporting any symptom for 12 weeks or more²³. Our estimate of the prevalence of COVID-19-related persistent symptoms is therefore approximately tenfold the background prevalence.

The overall reduction in persistent symptoms between rounds 3–5 and round 6 was driven by a decline in persistent tiredness of more than half, from 16.8% to 8.0%. There are several potential explanations. The majority (60%) of infections reported in round 6 were from pre-July 2020, so the decline in prevalence may reflect a proportion of people recovering from their illness and not reporting it (recall bias). Seasonality may affect symptom prevalence, although background symptom prevalence was largely consistent across the study period. Studies have found associations between lockdown measures and elevated levels of tiredness²⁴ and stress^25,26 and while rounds 3–5 were conducted predominantly when the UK was under restrictions or lockdown measures, round 6 was conducted during the transition from ‘step two’ of the reopening—when schools, retail and outdoor hospitality were open—to ‘step three’, when the ‘rule of six’ was implemented and indoor venues were allowed to reopen²⁷. Finally, the structure of the survey changed in round 6, and the list of symptoms surveyed was amended. In our analysis of round 6, we focused on the 27 (of 29) symptoms that were in common with rounds 3–5, which may have slightly under-estimated symptom reporting prevalence in round 6 compared with the earlier rounds.

Increasing age, female sex, BMI, hospitalisation and co-morbidities have previously been identified as risk factors for Long COVID^8,28,29. Our finding of a linear association between age and persistent symptoms following COVID-19 contrasts with some other studies that suggest the highest prevalence is found in middle-aged groups⁹. This discrepancy may reflect the fact that older age groups in the community have lower infection rates than younger people³⁰ and are more likely to be asymptomatic^31,32; once these factors were corrected for by conditioning on symptoms post-COVID-19, then the apparently lower prevalence of persistent symptoms at older ages was no longer seen.

Our identification of two stable symptom clusters at 12 weeks in rounds 3–5, with similar patterns identified in sensitivity analyses using different clustering methods, suggests that Long COVID may have distinct subgroups, including one (Cluster L2) characterised by high prevalence of shortness of breath and tight chest/chest pain. These and other related symptoms also had high prevalence in Cluster L2_R6 in the round 6 replication data. Previous studies have taken a similar unsupervised approach to characterising subtypes of Long COVID, albeit at earlier time points: Sudre et al.⁸ identified two symptom clusters at 28 days post-symptom-onset, although these differed from our clusters. Huang et al.²⁰ identified five clusters at 61 days, in two of which there was high prevalence of respiratory symptoms as seen in cluster L2 (and Cluster L2_R6) in our study.

Strengths and limitations

This study included data from a large random community sample with a high response rate (26–29% across 3–5), and use of weighting to provide population prevalence estimates, thus providing more representative information on persistent COVID-19 symptoms in the community. This is in contrast to other studies that have been based on specific patient groups, especially those based on hospitalised cases⁵. We asked about presence of symptoms rather than Long COVID to reduce potential reporting bias. However, it is clear that a wide spectrum of symptoms and clinical presentations post-COVID-19 may be involved; for example, our open free-text question identified a number of symptoms not included in our questionnaire including “brain fog”, “palpitations” and “hair loss”, which were subsequently included in round 6³³. As the study was based on self-reported data and many of the symptoms are common and not specific to COVID-19, we compared our estimates with those obtained in the general population from people testing negative in the REACT-1 study.

Limitations include the retrospective study design, which introduces the possibility of recall bias. In previous analyses, however, we have shown that participant reports of date of onset of their symptoms produce an epidemic curve that very closely tracks the epidemic^31,34,35. In addition, our analysis of REACT-1 data supports the finding of increasing proportions of asymptomatic infection in older age groups and suggests that this is not an artefact of differential recall of symptoms in older participants. Respondents were restricted to reporting a single date of (initial) symptom onset which does not allow for delayed onset of some symptoms, nor does it allow for the reporting of relapsing symptoms that appear to be a feature of Long COVID⁸. Respondents were also restricted to reporting overall illness severity, rather than symptom-specific severity, and were not asked to report when their symptoms were more severe. A further limitation, despite the high response rate, is the possibility of participation bias as the REACT-2 study included a self-administered LFIA³¹; it is plausible that people with persistent symptoms may have been more likely to participate in order to ascertain their antibody status.

Implications

We have identified a substantial proportion of people who experience persistent symptoms lasting 12 weeks or more post COVID-19. After the initial decline in symptom prevalence between 4 and 12 weeks the prevalence of persistent symptoms plateaued indicating that large numbers of people may have chronic symptoms requiring investigation and intervention including rehabilitation. We show here that economically disadvantaged people and those in deprived areas appear to have a higher burden of persistent symptoms post COVID-19, compounding the excess burden of severe illness and mortality from COVID-19 experienced by these groups^36,37.

We identified two clusters of participants based on their symptoms, including one in which shortness of breath and tight chest/chest pain predominated. Further studies are required to investigate the underlying pathophysiology. Clinicians and other healthcare professionals may benefit from education on the range of presenting symptoms to best support patients towards recovery.

In conclusion, the scale of morbidity identified in this study post COVID-19 presents significant challenges for the affected individuals and their families, and indicates a high potential population health burden. Managing the long-term sequelae of COVID-19 will remain a major challenge for affected individuals and their families and for health services.

Methods

Participants

The REACT-2 programme evaluated community prevalence of SARS-CoV-2 anti-spike protein antibody positivity in England. Random population samples of adults in England were invited to take part every 2–4 months using the National Health Service (NHS) patient list to achieve similar numbers of participants in each of 315 lower-tier local authority (LTLA) areas³⁸. Participants registered via an online portal or by telephone. Those registered were sent a test kit by post that included a self-administered point-of-care LFIA test with instructions and a link to an online video. Participants completed a survey (online/telephone) upon completion of their self-test. Participants provided information on demographics, household composition, comorbidities, and whether or not they thought that they had had COVID-19. Those who reported having had COVID-19 were asked whether or not they had had a PCR test, symptoms related to COVID-19, date of first symptom onset, severity of symptoms, and duration of any of a list of 29 symptoms³⁹. In addition, we asked participants to report any other symptoms in free text. Personalised invitations were sent to between 560,000 and 600,000 individuals aged 18+ years in each of rounds 3–5 of the REACT-2 study, carried out from 15 to 28 September 2020 (round 3), 27 October to 10 November 2020 (round 4) and 25 January to 8 February 2021 (round 5). Registrations closed after ~190,000 people had signed up at each round. A further 384,988 invitations were sent in round 6, carried out from 12 to 25 May 2021, and registration was closed after ~100,000 people had signed up. A booster sample of people aged 55 years and above was also recruited in round 6 but these data are excluded from analyses here for comparability with rounds 3–5.

Our primary study population comprised 76,155 participants from rounds 3–5 who self-reported having had COVID-19—either suspected or PCR confirmed—with one or more of 29 symptoms 12 weeks or more before the survey date (Supplementary Fig. 1). In addition to the 29 symptoms enquired about on the questionnaire in rounds 3–5, 8370 respondents gave free-text descriptions of other symptoms. Free-text analysis of co-occurring words indicated common additional symptoms which were not in the round 3–5 survey, including brain-fog, hair-loss, blood-pressure, heart-palpitations and, severe-joint-pain (Supplementary Fig. 3). Free text responses informed the additional symptoms that were surveyed in round 6 (35 symptoms in all of which 27—as 23 symptom groups—were in common with those asked in rounds 3–5).

We repeated our main analyses in an independent data set comprising 13,170 participants from round 6 who reported one or more of an expanded list of 35 symptoms 12 weeks or more before the survey date; 27 (as 23 symptom groups) of these 35 symptoms were in common with rounds 3–5 (see Supplementary Methods). To maintain comparability with symptom reporting in rounds 3–5 we also restricted some analyses in round 6 to the 27 symptoms in common.

In a sensitivity analysis we used a subset of 14,704 participants from rounds 3–5 who had a self-reported COVID-19 infection and tested positive for antibodies on the REACT-2 LFIA test.

To estimate background prevalence of symptoms, we used data from the REACT-1 study, which tracks community infection with PCR tests among independent population samples recruited with an identical sampling frame to REACT-2. REACT-1 also includes children aged 5–17 years, who were excluded from the current analyses. REACT-1 sought history of any of 26 symptoms that persisted for 11 or more days. The REACT-1 data were weighted in a similar fashion to the REACT-2 data to give population estimates that were representative of the adult population of England as a whole (see below).

Data analysis

In rounds 3–5 (September 2020–February 2021) we obtained prevalence estimates for reporting of one or more of 29 symptoms by sex, age and other characteristics at 12 weeks after initial symptom onset. Our main analyses focused on individual symptoms reported as lasting for 12 weeks (84 days) or more, excluding 260 participants with inconsistent or missing data (see Fig. 1). We also obtained prevalence estimates for round 6 (May 2021).

Prevalence estimates were weighted by sex, age, ethnicity, LTLA population and index of multiple deprivation, to take account of the sampling design that gave approximately equal numbers of participants in each LTLA, and differential response rates, to obtain prevalence estimates that were representative of the population of England as a whole.

We used logistic regression (age-sex and mutually adjusted) to investigate the associations of demographic and lifestyle factors with persistence of symptoms at 12 weeks or more, and gradient boosted tree models⁴⁰ to investigate predictive ability (area under the curve, AUC) changes from adding variables to the model for persistent symptoms at 12 weeks or more. This analysis was repeated in the REACT-2 round 6 data. Modelling approaches are described in detail in the Supplementary Methods.

To identify a more specific set of persistent symptoms associated with history of COVID-19, in sensitivity analyses, we carried out variable selection in a 30% subset of symptomatic participants in rounds 3–5: in univariable models, we identified a subset of persistent symptoms (12 or more weeks) that were positively associated with a reported prior positive PCR test and estimated the population prevalence of persistence of one or more of these symptoms. We also repeated the logistic and gradient boosted tree modelling with this subset of symptoms as outcome variables.

Generalised additive models (GAMs) were constructed with likelihood of symptom persistence at 12 weeks or more modelled as a smoothed function of age and sex. A default thin plate spline was used and the smoothed functions were plotted to visualise the relationship between risk of persistent symptoms and age.

We used the results from the free-text analysis to identify single and co-occurring words to indicate other symptoms recorded by participants and plotted these in a network.

To identify symptom clusters segmenting participants in rounds 3–5, a binary matrix was constructed for presence or absence (1 or 0) of each of the 29 surveyed symptoms at 12 weeks after symptom onset, for each participant. Clustering was performed using the CLustering LARge Applications (CLARA) extension of the Partitioning Around Medoids (PAM) algorithm, implemented in the R package fpc⁴¹. Briefly, PAM searches for the most representative data points to become cluster centroids by minimising the sum of dissimilarities between data points and their assigned centroids. CLARA uses a sampling approach to reduce the computational burden for large data sets. We used Hamming distance as a measure of dissimilarity between participants. In rounds 3–5, we determined the optimal number of clusters using the average silhouette width. We used two methods to assess cluster stability. First, we bootstrapped and re-clustered 100 times, then quantified the difference between bootstrapped and non-bootstrapped clusters using the Jaccard coefficient, which can range from 0 (no overlap) to 1 (perfect overlap)⁴². Second, we removed each symptom in turn, re-clustered, then calculated the average proportion of non-overlap (APN) between these and whole-dataset clusters as a proxy for the individual variable importance and contribution to the population segmentation⁴³.

To visualise symptom patterns in the clusters we created heatmaps showing pairwise symptom co-occurrence at 12 weeks in the clusters separately.

As sensitivity analyses, we also ran PAM clustering using both Jaccard and Dice distance⁴⁴ (which, unlike Hamming distance, do not consider negative cooccurrence), and, further, conducted Latent Class Analysis (LCA) as an entirely different approach to identifying structure in the symptom data. LCA was applied using the poLCA package in R⁴⁵.

All data collection for the REACT2 study was captured with Questback (Spring 2020 installation)⁴⁶. Analysis was conducted in R version 4.0.5⁴⁷. We obtained research ethics approval from the South Central-Berkshire B Research Ethics Committee (IRAS ID: 283787). The REACT Public Advisory Panel provides regular review of the study processes and results. Participants in the study provided informed consent.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The original datasets generated or analysed, or both, during this study are not publicly available because of governance restrictions and the identifiable nature of the data. Requests for access to raw data should be addressed to the corresponding authors and will be answered within 12 weeks. Summary tabular data are provided here. The study materials and questionnaires used in this study can be found here.

References

Urgent need for more research to understand Long Covid. https://royalsociety.org/news/2020/10/urgent-need-to-understand-long-covid/ (Accessed 31 March 2022).
Del Rio, C., Collins, L. F. & Malani, P. Long-term health consequences of COVID-19. JAMA 324, 1723–1724 (2020).
Article Google Scholar
Living with Covid19—Second Review. https://evidence.nihr.ac.uk/themedreview/living-with-covid19-second-review/, https://doi.org/10.3310/themedreview_45225 (Accessed 31 March 2022).
Overview | COVID-19 rapid guideline: managing the long-term effects of COVID-19 | Guidance (NICE).
Lopez-Leon, S. et al. More than 50 long-term effects of COVID-19: a systematic review and meta-analysis. Sci. Rep. 11. https://doi.org/10.1038/s41598-021-95565-8 (2021).
Augustin, M. et al. Post-COVID syndrome in non-hospitalised patients with COVID-19: a longitudinal prospective cohort study. Lancet Reg. Health Eur. 6, 100122 (2021).
Article Google Scholar
Callard, F. & Perego, E. How and why patients made Long Covid. Soc. Sci. Med. 268, 113426 (2021).
Article Google Scholar
Sudre, C. H. et al. Attributes and predictors of long COVID. Nat. Med. 27, 626–631 (2021).
Article CAS Google Scholar
Ayoubkhani, D. Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK. Office for National Statistics. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/1april2021 (2021).
Cirulli, E. T. et al. Long-term COVID-19 symptoms in a large unselected population. Preprint at bioRxiv https://doi.org/10.1101/2020.10.07.20208702 (2020).
Klein, H. et al. Onset, duration and unresolved symptoms, including smell and taste changes, in mild COVID-19 infection: a cohort study in Israeli patients. Clin. Microbiol. Infect. https://doi.org/10.1016/j.cmi.2021.02.008 (2021).
Article PubMed PubMed Central Google Scholar
Yong, S. Long COVID or post-COVID-19 syndrome: putative pathophysiology, risk factors, and treatments. Infectious Diseases 53, 737–754 (2021).
Living with Covid19. https://evidence.nihr.ac.uk/themedreview/living-with-covid19/, https://doi.org/10.3310/themedreview_41169 (Accessed 31 March 2022).
Arnold, D. T. et al. Patient outcomes after hospitalisation with COVID-19 and implications for follow-up: results from a prospective UK cohort. Thorax 76, 399–401 (2021).
Article Google Scholar
Carfì, A., Bernabei, R. & Landi, F., Gemelli Against COVID-19 Post-Acute Care Study Group. Persistent symptoms in patients after acute COVID-19. JAMA 324, 603–605 (2020).
Article Google Scholar
Evans, R. A. et al. Physical, cognitive, and mental health impacts of COVID-19 after hospitalisation (PHOSP-COVID): a UK multicentre, prospective cohort study. Lancet Respir. Med. https://doi.org/10.1016/S2213-2600(21)00383-0 (2021).
Venturelli, S. et al. Surviving COVID-19 in Bergamo province: a post-acute outpatient re-evaluation. Epidemiol. Infect. 149, e32 (2021).
Article CAS Google Scholar
Tomasoni, D. et al. Anxiety and depression symptoms after virological clearance of COVID-19: A cross-sectional study in Milan, Italy. J. Med. Virol. 93, 1175–1179 (2021).
Article CAS Google Scholar
Moreno-Pérez, O. et al. Post-acute COVID-19 syndrome. Incidence and risk factors: a Mediterranean cohort study. J. Infect. 82, 378–383 (2021).
Article Google Scholar
Huang, Y. et al. COVID symptoms, symptom clusters, and predictors for becoming a long-hauler: looking for clarity in the haze of the pandemic. Preprint at medRxiv https://doi.org/10.1101/2021.03.03.21252086 (2021).
Yomogida, K. et al. Post-acute sequelae of SARS-CoV-2 infection among adults aged ≥18 years—Long Beach, California, April 1–December 10, 2020. Morb. Mortal. Wkly. Rep. 70, 1274–1277 (2021).
Article CAS Google Scholar
Blomberg, B. et al. Long COVID in a prospective cohort of home-isolated patients. Nat. Med. https://doi.org/10.1038/s41591-021-01433-3 (2021).
Ayoubkhani, D., Pawelek, P. & Gaughan, C. Technical article: updated estimates of the prevalence of post-acute symptoms among people with coronavirus (COVID-19) in the UK. Office for National Statistics. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/articles/technicalarticleupdatedestimatesoftheprevalenceofpostacutesymptomsamongpeoplewithcoronaviruscovid19intheuk/26april2020to1august2021 (2021).
Ali, A., Siddiqui, A. A., Arshad, M. S., Iqbal, F. & Arif, T. B. Effects of COVID-19 pandemic and lockdown on lifestyle and mental health of students: a retrospective study from Karachi, Pakistan. Ann. Med. Psychol. https://doi.org/10.1016/j.amp.2021.02.004 (2021).
Article Google Scholar
Morgül, E., Kallitsoglou, A. & Essau, C. A. Psychological effects of the COVID-19 lockdown on children and families in the UK. Rev. Psicol. Clín. con Niños Adolesc. 7, 42–28 (2020).
Google Scholar
Odriozola-González, P., Planchuelo-Gómez, Á., Irurtia, M. J. & de Luis-García, R. Psychological effects of the COVID-19 outbreak and lockdown among students and workers of a Spanish university. Psychiatry Res. 290, 113108 (2020).
Article Google Scholar
Timeline of UK government coronavirus lockdowns. https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns (accessed 31 March 2022).
Sykes, D. L. et al. Post-COVID-19 symptom burden: what is long-COVID and how should we manage it? Lung 199, 113–119 (2021).
Article CAS Google Scholar
Michelen, M. et al. Characterising long COVID: a living systematic review. BMJ Glob. Health 6, e005427 (2021).
Article Google Scholar
Riley, S. et al. Resurgence of SARS-CoV-2: detection by community viral surveillance. Science 372, 990–995 (2021).
Article CAS Google Scholar
Ward, H. et al. SARS-CoV-2 antibody prevalence in England following the first peak of the pandemic. Nat. Commun. 12, 905 (2021).
Article ADS CAS Google Scholar
Elliott, J. et al. Predictive symptoms for COVID-19 in the community: REACT-1 study of over 1 million people. PLoS Med. 18, e1003777 (2021).
Article CAS Google Scholar
Fernández-de-Las-Peñas, C. et al. Long-term post-COVID symptoms and associated risk factors in previously hospitalized patients: a multicenter study. J. Infect. 83, 237–279 (2021).
PubMed PubMed Central Google Scholar
Ward, H. et al. Prevalence of antibody positivity to SARS-CoV-2 following the first peak of infection in England: serial cross-sectional studies of 365,000 adults. Lancet Reg. Health Eur. 4, 100098 (2021).
Article Google Scholar
Ward, H. et al. REACT-2 Round 5: increasing prevalence of SARS-CoV-2 antibodies demonstrate impact of the second wave and of vaccine roll-out in England. https://doi.org/10.1101/2021.02.26.21252512 (2021).
Davies, B. et al. Community factors and excess mortality in first wave of the COVID-19 pandemic in England. Nat. Commun. 12, 3755 (2021).
Article ADS CAS Google Scholar
Ward, H. et al. Global surveillance, research, and collaboration needed to improve understanding and management of long COVID. Lancet https://doi.org/10.1016/S0140-6736(21)02444-2 (2021).
Riley, S. et al. REal-time Assessment of Community Transmission (REACT) of SARS-CoV-2 virus: study protocol. Wellcome Open Res. 5, 200 (2020).
Article Google Scholar
Imperial College London. Real-time Assessment of Community Transmission (REACT) study. https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/ (accessed 31 March 2022).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Preprint at arXiv [cs.LG]: 1706.09516 (2017).
Hennig, C. fpc: Flexible Procedures for Clustering. 2020. https://cran.r-project.org/web/packages/fpc/index.html.
Hennig, C. Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52, 258–271 (2007).
Article MathSciNet Google Scholar
Datta, S. & Datta, S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003).
Article CAS Google Scholar
Sorensen & Julius, T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons (I kommission hos E. Munksgaard, 1948).
Linzer, D. A. & Lewis, J. B. poLCA: an R package for polytomous variable latent class analysis. J. Stat. Softw. 42, 1–29 (2011).
Article Google Scholar
EFS Survey. Version EFS Sprint 2020 (Questback GmbH, 2020).
Team RC. R Core Team R: A Language and Environment For Statistical Computing (R foundation for Statistical Computing, Austria, Vienna, 2018).

Download references

Acknowledgements

M.W. and M.C.-H. acknowledge support from the H2020-EXPANSE project (Horizon 2020 grant No. 874627). M.C.-H. acknowledges support from Cancer Research UK, Population Research Committee Project grant ‘Mechanomics’ (grant no 22184 to M.C.-H.). G.C. is supported by an NIHR Professorship. WSB is the Action Medical Research Professor, A.D. is an NIHR senior investigator and D.A. and P.E. are Emeritus NIHR Senior Investigators. S.R. acknowledges support from MRC Centre for Global Infectious Disease Analysis, NIHR Health Protection Research Unit, Wellcome Trust (200861/Z/16/Z, 200187/Z/15/Z), and Centres for Disease Control and Prevention (US, U01CK0005-01-02). H.W. is a National Institute for Health Research (NIHR) Senior Investigator and acknowledges support from NIHR Biomedical Research Centre of Imperial College NHS Trust, NIHR School of Public Health Research, NIHR Applied Research Collaborative North West London, and Wellcome Trust (UNS32973). P.E. is Director of the MRC Centre for Environment and Health (MR/L01341X/1, MR/S019669/1). P.E. acknowledges support from the NIHR Imperial Biomedical Research Centre and the NIHR HPRUs in Chemical and Radiation Threats and Hazards and in Environmental Exposures and Health, the British Heart Foundation Centre for Research Excellence at Imperial College London (RE/18/4/34215), Health Data Research UK (HDR UK) and the UK Dementia Research Institute at Imperial (MC_PC_17114). We thank The Huo Family Foundation for their support of our work on COVID-19. We thank key collaborators on this work—Ipsos MORI: Stephen Finlay, John Kennedy, Kevin Pickering, Duncan Peskett, Sam Clemens and Kelly Beaver; Institute of Global Health Innovation at Imperial College London: Gianluca Fontana; School of Public Health, Imperial College London: Eric Johnson, Rob Elliott, Graham Blakoe; the Imperial Patient Experience Research Centre and the REACT Public Advisory Panel; NHS Digital for access to the NHS Register; Dr. Nisreen Alwan. The study was funded by the Department of Health and Social Care in England. Our work on Long COVID is also being supported by grants from NIHR and UK Research and Innovation (UKRI): REACT GE (MR/V030841/1) and REACT Long COVID (REACT-LC) (COV-LT-0040).

Author information

These authors contributed equally: Matthew Whitaker, Joshua Elliott.
These authors jointly supervised this work: Graham Cooke, Helen Ward, Paul Elliott.

Authors and Affiliations

School of Public Health, Imperial College London, London, UK
Matthew Whitaker, Marc Chadeau-Hyam, Steven Riley & Paul Elliott
MRC Centre for Environment and Health, Imperial College London, London, UK
Matthew Whitaker, Marc Chadeau-Hyam & Paul Elliott
Imperial College Healthcare NHS Trust, London, UK
Joshua Elliott, Ara Darzi, Graham Cooke, Helen Ward & Paul Elliott
Department of Infectious Disease, Imperial College London, London, UK
Joshua Elliott & Graham Cooke
MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, UK
Steven Riley & Helen Ward
Abdul Latif Jameel Institute for Disease & Emergency Analytics, Imperial College London, London, UK
Steven Riley
Institute of Global Health Innovation at Imperial College London, London, UK
Ara Darzi
National Institute for Health Research Imperial Biomedical Research Centre, London, UK
Graham Cooke, Helen Ward & Paul Elliott
Health Data Research (HDR) UK London at Imperial College, London, UK
Paul Elliott
UK Dementia Research Institute at Imperial College, London, UK
Paul Elliott

Authors

Matthew Whitaker
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Elliott
View author publications
You can also search for this author in PubMed Google Scholar
Marc Chadeau-Hyam
View author publications
You can also search for this author in PubMed Google Scholar
Steven Riley
View author publications
You can also search for this author in PubMed Google Scholar
Ara Darzi
View author publications
You can also search for this author in PubMed Google Scholar
Graham Cooke
View author publications
You can also search for this author in PubMed Google Scholar
Helen Ward
View author publications
You can also search for this author in PubMed Google Scholar
Paul Elliott
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.W.—conceptualisation, formal analysis, visualisation, methodology, writing—original draft, writing—review & editing; J.E.—conceptualisation, formal analysis, methodology, writing—original draft, writing—review & editing; M.C.-H.—supervision, methodology, writing—review & editing; S.R.—supervision, methodology, writing– review & editing; A.D.—funding acquisition, supervision, writing—review & editing; G.C.—conceptualisation, supervision, methodology, writing—review & editing; H.W.—conceptualisation, supervision, methodology, writing—review & editing; P.E.—funding acquisition, conceptualisation, supervision, methodology, writing—original draft, writing—review & editing.

Corresponding author

Correspondence to Paul Elliott.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Whitaker, M., Elliott, J., Chadeau-Hyam, M. et al. Persistent COVID-19 symptoms in a community study of 606,434 people in England. Nat Commun 13, 1957 (2022). https://doi.org/10.1038/s41467-022-29521-z

Download citation

Received: 09 July 2021
Accepted: 16 March 2022
Published: 12 April 2022
DOI: https://doi.org/10.1038/s41467-022-29521-z

This article is cited by

Characterisation, symptom pattern and symptom clusters from a retrospective cohort of Long COVID patients in primary care in Catalonia
- Gemma Torrell
- Diana Puente
- Anna Berenguera
BMC Infectious Diseases (2024)
Blood–brain barrier disruption and sustained systemic inflammation in individuals with long COVID-associated cognitive impairment
- Chris Greene
- Ruairi Connolly
- Matthew Campbell
Nature Neuroscience (2024)
Calcium channel blockers may reduce the development of long COVID in females
- Takuya Ozawa
- Ryusei Kimura
- Koichi Fukunaga
Hypertension Research (2024)
Post-COVID-19 conditions: a systematic review on advanced magnetic resonance neuroimaging findings
- Sana Mohammadi
- Sadegh Ghaderi
Neurological Sciences (2024)
Longer-Term Mental Health Consequences of COVID-19 Infection: Moderation by Race and Socioeconomic Status
- Michelle K. Williams
- Christopher A. Crawford
- Jesse C. Stewart
International Journal of Behavioral Medicine (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.