Abstract
Investigating whether DNA methylation (DNA-M) at an earlier age is associated with lung function at a later age and whether this relationship differs by sex could enable prediction of future lung function deficit.
A training/testing-based technique was used to screen 402 714 cytosine-phosphate-guanine dinucleotide sites (CpGs) to assess the longitudinal association of blood-based DNA-M at ages 10 and 18 years with lung function at 18 and 26 years, respectively, in the Isle of Wight birth cohort (IOWBC). Multivariable linear mixed models were applied to the CpGs that passed screening. To detect differentially methylated regions (DMRs), DMR enrichment analysis was conducted. Findings were further examined in the Avon Longitudinal Study of Parents and Children (ALSPAC). Biological relevance of the identified CpGs was assessed using gene expression data.
DNA-M at eight CpGs (five CpGs with forced expiratory volume in 1 s (FEV1) and three CpGs with FEV1/forced vital capacity (FVC)) at an earlier age was associated with lung function at a later age regardless of sex, while at 13 CpGs (five CpGs with FVC, three with FEV1 and five with FEV1/FVC), the associations were sex-specific (pFDR<0.05) in IOWBC, with consistent directions of association in ALSPAC (IOWBC-ALSPAC consistent CpGs). cg16582803 (WNT10A) and cg14083603 (ZGPAT) were replicated in ALSPAC for main and sex-specific effects, respectively. Among IOWBC-ALSPAC consistent CpGs, DNA-M at cg01376079 (SSH3) and cg07557690 (TGFBR3) was associated with gene expression both longitudinally and cross-sectionally. In total, 57 and 170 DMRs were linked to lung function longitudinally in males and females, respectively.
CpGs showing longitudinal associations with lung function have the potential to serve as candidate markers in future studies on lung function deficit prediction.
Abstract
Population-based cohort studies show that methylated sites at an earlier age are associated with lung function at a later age, possibly sex-specifically, and detected markers could serve as candidates on lung function deficit prediction in future studies https://bit.ly/3av22Dx
Introduction
Lung function is pivotal for the diagnosis of respiratory diseases and predicts future disease development [1]. Lung function, specifically forced expiratory volume in 1 s (FEV1), is inversely correlated with morbidities such as asthma and COPD, and early mortality [2, 3]. The growth of lung function in childhood and adolescence is associated with age and height and the decline in adulthood is related to ageing [3, 4]. In addition, the maximal level of lung function and the age of decline are dependent on sex [3, 5]. Several biological factors determine such sex-dependency including anatomical, immunological, and hormonal factors [5, 6].
The impact of environmental factors on respiratory health and lung function is significant [7]. The importance of the interaction between genetic and environmental factors in determining lung function suggests that other gene regulatory processes [8], such as epigenetic mechanisms, may act as an interface between environmental exposures and genetics [9, 10]. DNA methylation (DNA-M), most commonly the addition of a methyl group onto the 5′ position of the cytosine base at cytosine-phosphate-guanine dinucleotide sites (CpGs), regulates gene expression by recruiting proteins involved in gene repression or by impeding the binding of transcriptional proteins to DNA [11]. Several studies have shown the association of blood-based DNA-M with lung function [12–17] or with related diseases such as asthma [18] and COPD [12–15]. Most existing epigenetic studies on lung function were cross-sectional and focused on older people (>40 years) [12–16]. Cross-sectional designs are subject to reverse causation and create temporal ambiguity. To our knowledge, no existing studies have used repeated measurements of DNA-M, together with longitudinally measured lung function to assess the association of DNA-M with lung function and the stability of these associations over time.
DNA-M changes over time at specific CpGs [19, 20], and such changes have been shown to be sex-specific [20, 21]. Changes in DNA-M can occur in response to biological ageing, but also to environmental exposures [22]. That is, DNA-M at certain CpGs reflects the memory of past exposure as well as significant changes at different stages of life. The association between change in DNA-M and change in lung function has been shown to be different between males and females [23]. However, it is unknown whether DNA-M at an earlier age is associated with lung function at a later age, whether such longitudinal associations are invariant to DNA-M changes over time, and how such associations are different between males and females. A longitudinal design with repeatedly measured DNA-M and lung function data would allow assessment of the stability of time-lagged associations between DNA-M and lung function. As DNA-M has been found to be a potential driver of biological ageing [24], DNA-M biomarkers which have a stable time-lagged association could be useful to predict lung function deficit and detect possible related diseases at an earlier age before the pathology becomes apparent. We hypothesised that DNA-M at specific CpGs in early life is associated with lung function at a later age and such association would be sex-specific. The study was carried out in the Isle of Wight birth cohort (IOWBC) in the UK. To assess generalisability, the findings were further examined in an independent birth cohort, the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in the UK.
Materials and methods
Study subjects and design
IOWBC: discovery cohort
IOWBC is a prospective population-based birth cohort established in 1989. Longitudinal monitoring of allergic diseases, phenotypic measures, genetics and assessments of environmental exposures were conducted at birth and multiple ages from 1 year to 26 years of age. Forced vital capacity (FVC) and FEV1 were measured at age 10 years (n=980), 18 years (n=838) and 26 years (n=546), and the ratio of FEV1 over FVC (FEV1/FVC) was calculated. Genome-wide DNA-M was measured from peripheral blood samples collected at age 10 years (n=330), 18 years (n=476) and 26 years (n=303) from randomly selected subjects for whom DNA was available using the Infinium HumanMethylation450K or EPIC BeadChips (Illumina, Inc., San Diego, CA, USA). After quality control, preprocessing, and excluding probes with single nucleotide polymorphisms, 402 714 CpGs were included in the statistical analyses. RNA-sequencing gene expression data for subjects at age 26 years was available in IOWBC. A detailed description of IOWBC can be found in the online supplement.
ALSPAC: replication cohort
Findings in the IOWBC were further tested in an independent cohort, ALSPAC [25, 26], where DNA-M data at age 7 and 15 years and lung function measurements at age 15 and 24 years were available for replication analyses. Details of these data along with information on covariates are presented in the online supplement. The study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).
Statistical analyses
To assess whether subjects examined in the study at ages 18 and 26 years reasonably represented those in the complete IOWBC, continuous variables were evaluated using nonparametric one sample sign tests and categorical variables were examined implementing one-sample proportion tests.
Analyses of longitudinal association
Lung function measurements at each age were adjusted by height. DNA-M adjusted for cell types, principle components, and batch effects at each CpG was used (see details in the online supplement). In IOWBC, a two-step analytical approach was used to assess the longitudinal association between DNA-M and lung function at two time-lagged periods: period-1 (10–18 years), the association of DNA-M at age 10 years with lung function at 18 years; and period-2 (18–26 years), the association of DNA-M at age 18 years with lung function at 26 years. In the first step, we filtered out CpGs not potentially associated with lung function in either of the two periods using a screening package, “ttScreening” in R 3.3.2 version (detailed in the online supplement) [27, 28]. The screening was applied to each lung function parameter and performed for both time periods, stratified by sex.
In the second step, linear mixed models (LMM) with repeated measures were implemented in period-1 and period-2 in SAS 9.4 (SAS Institute Inc., Cary, NC, USA). Model-1 focused on the main effects of DNA-M. Potential confounders, including birth weight, gestational age, sex, duration of breastfeeding, maternal smoking exposure during pregnancy, recurrent chest infection at ages 1, 2 and 4 years, socioeconomic status, repeated measures of body mass index, smoking status, and paracetamol use at ages 18 and 26 years, were included in model-1. To assess sex-specificity, we further extended model-1 by including DNA-M×sex interaction in model-2. Multiple testing was corrected by controlling false discovery rate (FDR) of 0.05 in both models [29].
Analyses of differentially methylated regions (DMRs)
Regional differential methylation signals among the CpGs that passed screening were examined using DMRcate [30] with default settings of including ≥2 significant CpGs that passed screening in a region of ≥1000 nucleotides (pFDR<0.05) [30] (detailed in the online supplement).
Replication analysis in ALSPAC
The CpGs identified in IOWBC were further examined in ALSPAC to validate the IOWBC findings. Following a similar approach as that in the IOWBC, i.e. via LMMs with repeated measures, the longitudinal association of DNA-M at age 7 years with lung function at age 15 years, and DNA-M at age 15 years with lung function at age 24 years was examined, controlling the effects of confounders except for two covariates, recurrent chest infection and paracetamol use, which were unavailable.
Gene expression analysis
To assess potential biological relevance of the identified CpGs in model-1 and -2, we examined the association of DNA-M at those CpGs with the expression of their corresponding genes in blood. Linear regressions were applied to two datasets, DNA-M at age 18 years with gene expression at 26 years (longitudinal associations) and DNA-M at age 26 years with gene expression at the same age (cross-sectional associations).
Results
Results of longitudinal association analysis in IOWBC
In total, 332 (172 females) participants were included who had the complete (both DNA-M and lung function) data in at least one of the two periods (figure 1). The analysed sub-samples at age 18 (n=315) and 26 years (n=268) were not statistically different from the enrolled sample with lung function (18 years, n=839; 26 years, n=547) for FVC, FEV1, and FEV1/FVC at the corresponding ages, except FEV1 at age 18 years which was higher in the subsample (table 1). Using ttScreening, in total, 194, 207, and 149 CpGs with DNA-M at ages 10 and 18 years were identified as associated with FVC, FEV1, and FEV1/FVC at 18 and 26 years, respectively. These CpGs were then included in subsequent analyses (figure 2). In model-1 (main effects of DNA-M), DNA-M at 14 CpGs (three CpGs with FVC, six with FEV1, and five with FEV1/FVC) at earlier ages was associated with lung function at later ages longitudinally (pFDR<0.05, table S1) after adjusting the confounders. In model-2 (interaction effects of DNA-M×sex), DNA-M at 26 CpGs showed sex-specific associations with lung function (nine CpGs with FVC, seven with FEV1, and 10 with FEV1/FVC; pFDR<0.05) (table S2, figure 2). The cg14083603 in WNT10A was identified by both model-1 and model-2.
Flow chart for final sample determination in the Isle of Wight birth cohort. DNA-M: DNA methylation.
Comparison of lung function measurements of enrolled participants and participants included in the analyses
Flow chart of statistical analyses and the number of cytosine-phosphate-guanine dinucleotide sites (CpGs) after each step. IOWBC: Isle of Wight birth cohort; ALSPAC: Avon Longitudinal Study of Parents and Children; GE: gene expression; DNA-M: DNA methylation; DNA-MageX: DNA-M at age X years; Lung functionageX: lung function at age X years; GEage26: gene expression at age 26 years; FDR: false discovery rate; FVC: forced vital capacity; FEV1: forced expiratory volume in 1 s; LMM: linear mixed models. #: 2 CpGs are common between the longitudinal and cross-sectional analysis of DNA-M with GE.
Replication in ALSPAC cohort
In total, 1342 participants (610 males) in ALSPAC had complete data (DNA-M and lung function) in at least one period. Among the 14 CpGs identified in model-1 in IOWBC, five for FEV1 and three for FEV1/FVC showed consistent directions of associations for the main effects, of which the effect of cg16582803 (WNT10A) was statistically significant (p=0.034) for FEV1 (table 2). Among the IOWBC-ALSPAC consistent eight CpGs, higher DNA-M at five CpGs (three associated with FEV1 and two with FEV1/FVC), mapped to ANKRD9, WNT10A, ZNF727, NRN1, and DNAJB6, at earlier ages were associated with lower lung function at later ages. While at the remaining three CpGs, mapped to HINFP, EFNA2 and C16orf87, higher DNA-M at earlier ages was associated with higher lung function at later ages (table 2). In model 2, 13 of the 26 CpGs (five CpGs associated with FVC, three with FEV1, and five with FEV1/FVC) showed consistent directions of associations for interaction effects with those in IOWBC (table 3) and among these 13 CpGs, cg14083603 (ZGPAT) was statistically significant (p=0.0183). For sex-specific analysis in model-2, in males higher DNA-M at eight CpGs at early ages was associated with lower lung function at later ages, while in females higher DNA-M at those CpGs was associated with higher lung function. At the remaining five CpGs, higher DNA-M was associated with higher lung function in males, while in females it was associated with lower lung function (table 3).
DNA methylation at cytosine-phosphate-guanine dinucleotide sites (CpGs) at earlier ages that showed consistent direction of associations with lung function at later age between the Isle of Wight birth cohort (IOWBC) and the Avon Longitudinal Study of Parents and Children (ALSPAC)
DNA methylation at cytosine-phosphate-guanine dinucleotide sites (CpGs) at earlier age that showed consistent sex-specific association with lung function at later age between the Isle of Wight birth cohort (IOWBC) and the Avon Longitudinal Study of Parents and Children (ALSPAC)
Results of gene expression analysis
In a longitudinal assessment of DNA-M at age 18 years with gene expression at 26 years (36 males and 72 females), five identified CpGs in model-1 and 11 in model-2 had the corresponding gene expression data. In longitudinal assessment none of the five CpGs in model-1 were associated with the relevant gene expression. In model-2, amongst the 11 CpGs, DNA-M at cg01376079 (SSH3), cg07557690 (TGFBR3), and cg15981851 (AGAP1) at age 18 years showed significant association with gene expression at age 26 years (table 4). In cross-sectional association of DNA-M at age 26 years with gene expression at 26 years (54 males and 85 females), one CpG in model-1 had corresponding gene expression data but showed no association. In model-2, eight identified CpGs had gene expression data and DNA-M at three CpGs, cg01376079 (SSH3), cg07557690 (TGFBR3), and cg19736286 (MSH6), were shown to be cross-sectionally associated with gene expression, with cg01376079 and cg07557690 also being associated with expression of the corresponding gene in the longitudinal assessment. In both longitudinal and cross-sectional assessment, consistent directions of DNA-M and gene expression associations were found for cg01376079 and cg07557690; higher methylation at cg01376079 was associated with lower expression of SSH3, while higher methylation at cg07557690 was associated with higher expression of TGFBR3 (table 4).
Association of DNA methylation (DNA-M) with gene expression in the Isle of Wight birth cohort (IOWBC)
Results of the DMRs analysis
DMR analyses focused on detecting regions showing differential methylation associated with lung function parameters. To potentially improve the power, via ttScreening, in males 486, 518, and 461 CpGs and in females 419, 559, and 842 CpGs were selected based on their association with FVC, FEV1, and FEV1/FVC, respectively, and were included in the DMR analyses. Using repeated measures of DNA-M and lung function, 17, 24, and 16 statistically significant DMRs in males and 57, 66, and 47 DMRs in females were identified for FVC, FEV1, and FEV1/FVC, respectively (pFDR<0.05). The DMRs containing ≥2 CpGs are presented in table 5 and the complete results in table S3. In total, 132 and 382 CpGs were in the 57 and 170 identified DMRs in males and females, respectively. Four genes were common between the mapped genes of the individually identified CpGs and those of DMRs, namely TGFBR3, WNT10A, LY6H, and GMIP.
Differentially methylated regions (DMRs; containing ≥2 cytosine-phosphate-guanine dinucleotide sites (CpGs)) of lung function at later age in relation to DNA methylation at earlier age identified by the DMRcate method
Discussion
We examined the longitudinal association of genome-wide DNA-M at ages 10 and 18 years with lung function at 18 and 26 years, respectively, using repeated measures from pre-adolescence to post-adolescence period at both individual sites and genomic regions. DNA-M at eight CpGs and 13 CpGs at an earlier age was shown to be associated with lung function at a later age for main effects and sex-specific effects, respectively, in the IOWBC, with consistent findings in ALSPAC. Among IOWBC-ALSPAC consistent CpGs, cg16582803 (WNT10A) and cg14083603 (ZGPAT) were replicated in ALSPAC in terms of direction of associations and were statistical significance for main effect and interaction effects on lung function, respectively. DNA-M at cg01376079 (SSH3) and cg07557690 (TGFBR3) was associated with gene expression and invariant to longitudinal or cross-sectional assessment. In total, 57 and 170 DMRs at earlier age in relation to lung function at later age were identified in males and females, respectively.
In our study, at a certain proportion of CpGs, the longitudinal associations were shown to be sex-specific. One possible explanation for such an observation might be due to sex-specific changes of DNA-M over time as we have previously observed [20]. Other studies also suggested significant sex differences in patterns of blood-based DNA-M at the genome scale [31]. Although the current study focused on longitudinal association of DNA-M and lung function, the observation on sex-specificity is consistent with our previous findings [23, 32]. In previous studies, the associations of changes in DNA-M with lung function changes [23] and DNA-M with lung function trajectories were found to be different between males and females [32]. Our further analyses indicated that such sex-specificity was time-invariant.
The mapped genes of replicated CpGs, such as cg16582803 on WNT10A and cg14083603 on ZGPAT, have plausible biological relevance to lung function and respiratory diseases. The Wnt/β-catenin pathway is centrally involved in lung development and several lung diseases [33, 34]. In particular, WNT10A plays an important role in pathogenesis of idiopathic pulmonary fibrosis (IPF) via transforming growth factor (TGF)-β activation [34]. Genetic variation in ZGPAT has been shown to be associated with lung function and also the risk of asthma and atopic dermatitis [35–37]. It has been suggested that DNA-M in ZGPAT has a causal effect on FEV1, mediated by changes in the expression of ZGPAT [37].
Longitudinal association of DNA-M at CpGs/DMRs with lung function measures at a later age may provide insight into the pathogenesis of impaired lung function growth. The association of differential methylation at some of these CpGs with gene expression, such as cg15981851 (AGAP1) for time-lagged, cg19736286 (MSH6) for cross-sectional assessment, and cg01376079 (SSH3) and cg07557690 (TGFBR3) for both longitudinal and cross-sectional assessment suggests a functional relevance of these CpGs. cg01376079 (SSH3) and cg07557690 (TGFBR3) manifest stable effects of DNA-M on gene expression. All the CpGs associated with the gene expression are located at promoter regions, except for cg15981851 (AGAP1), which is in the gene body (table 4).
It is important to note the biological relevance of cg07557690, located in the promoter region of gene TGFBR3 (TGF-β receptor type III). Among the identified CpGs showing associations with gene expression, the association of cg07557690 with expression of TGFBR3 was the strongest in both effect size and statistical significance. Expression of TGFBR3 is essential for optimal TGF-signalling during embryonic lung development [38]. TGF-β is also a key regulator of extracellular matrix composition and alveolar epithelial cell and fibroblast function in the lung. Prolonged alterations of TGF-β and its receptors result in compromised gas exchange and lung function, a feature of bronchopulmonary dysplasia, lung fibrosis and COPD [38, 39]. In addition, TGFBR3 has been suggested to play key roles in the pathogenesis of asthma [40] and COPD susceptibility [39]. TGFBR3 is also mapped within two lung function associated DMRs in this study. Together these results suggest that cg07557690 has potential utility as a biomarker of lung function development. Future in-depth studies of cg07557690 and how it is related to lung function are warranted.
An important strength of this study is its longitudinal design in which DNA-M measurement always precedes the lung function measurement to avoid temporal ambiguity (reverse causation). With repeated measures, longitudinally designed studies potentially gain a higher power to detect change over time and to identify differences between individuals, compared with cross-sectional studies. Moreover, the inclusion of a validation cohort increased the testing power of the identified CpGs. In addition, CpGs showing agreement between the two cohorts have potential generalisability at least in Caucasians.
There are a few limitations to this study. The median value of FEV1 at age 18 years, the proportion of males and females at age 18 years and smoking status at 26 years were different in the analysed samples to the overall study cohort. At age 26 years, lung function was available for fewer participants compared with age 18 years, leading to a smaller sample size in period-2. This study has Caucasian participants in both cohorts. Although we believe using a replication cohort with the same ethnicity as in the discovery cohort potentially improved the testing power, this design may limit the generalisability of the findings to other populations. In addition, while methylation of several CpGs was shown to be associated with relevant gene expression, this was in mixed cell populations from whole blood and it is not possible to assess cell-type specificity of the relationship, or the relevance to gene expression in the lung. Nevertheless, the identified CpGs have the potential to serve as candidate CpGs for lung function impairment prediction in future studies. Screening for such CpGs in early life may help to identify children at higher risk of reduced lung function at later ages.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material 00127-2021.SUPPLEMENT
Acknowledgements
The authors gratefully acknowledge the cooperation of the children and parents who participated in this study and appreciate the hard work of the Isle of Wight research team in collecting data. We thank the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics (funded by Wellcome Trust grant ref. 090532/Z/09/Z and MRC Hub grant G0900747 91070) for the generation of the methylation data. The authors are thankful to the High-Performance Computing facility at the University of Memphis. For the ALSPAC cohort, we are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses.
Footnotes
This article has supplementary material available from openres.ersjournals.com
Author contributions: S.K. Sunny carried out the study, conducted all the statistical analysis, interpreted the data and drafted the manuscript. H. Zhang designed the study, guided the analysis, and was involved in drafting and revision of the manuscript. F. Mzayek contributed to the conception and critical revision of the manuscript. J.W. Holloway and S. Ewart supervised the DNA methylation and RNA-seq measurement in IOWBC, and revised the manuscript. L. Kadalayil was involved in processing of RNA-seq data. S.H. Arsad was involved in data acquisition, DNA-M arraying and study design in IOWBC, and reviewed the manuscript. C.L. Relton and S. Ring were involved in the ALSPAC study design and provided the data. All authors read and approved the final manuscript.
Conflict of interest: S.K. Sunny has nothing to disclose.
Conflict of interest: H. Zhang has nothing to disclose.
Conflict of interest: C.L. Relton has nothing to disclose.
Conflict of interest: S. Ring has nothing to disclose.
Conflict of interest: L. Kadalayil has nothing to disclose.
Conflict of interest: F. Mzayek has nothing to disclose.
Conflict of interest: S. Ewart has nothing to disclose.
Conflict of interest: J.W. Holloway reports grants from the National Institutes of Health (USA) during the conduct of the study.
Conflict of interest: S.H. Arshad has nothing to disclose.
Support statement: The study conveyed in this publication was supported by the National Institute of Allergy and Infectious Diseases under Award Number R01 AI121226 (MPI: H. Zhang and J.W. Holloway). The 10-year follow-up of IOW cohort was funded by National Asthma Campaign, UK (grant number 364) and the 18-year follow-up by a grant from the National Heart and Blood Institute (R01 HL082925; principal investigator: S.H. Arshad). The UK Medical Research Council (MRC) and Wellcome (grant ref. 102215/2/13/2), and the University of Bristol provide core support for ALSPAC. A comprehensive list of grants funding is available on the ALSPAC website (www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). Generation of methylation array data was specifically funded by NIH R01AI121226 and R01AI091905, BBSRC BBI025751/1 and BB/I025263/1, and MRC MC_UU_12013/1, MC_UU_12013/2 and MC_UU_12013/8. Lung function measurements and were funded by grants from the MRC (G0401540/73080 and MR/M022501/1). Funding information for this article has been deposited with the Crossref Funder Registry.
- Received February 23, 2021.
- Accepted April 16, 2021.
- Copyright ©The authors 2021
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org