Abstract
Introduction Lymphangioleiomyomatosis (LAM) is a rare low-grade metastasising disease characterised by cystic lung destruction. The genetic basis of LAM remains incompletely determined, and the disease cell-of-origin is uncertain. We analysed the possibility of a shared genetic basis between LAM and cancer, and LAM and pulmonary function.
Methods The results of genome-wide association studies of LAM, 17 cancer types and spirometry measures (forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), FEV1/FVC ratio and peak expiratory flow (PEF)) were analysed for genetic correlations, shared genetic variants and causality. Genomic and transcriptomic data were examined, and immunodetection assays were performed to evaluate pleiotropic genes.
Results There were no significant overall genetic correlations between LAM and cancer, but LAM correlated negatively with FVC and PEF, and a trend in the same direction was observed for FEV1. 22 shared genetic variants were uncovered between LAM and pulmonary function, while seven shared variants were identified between LAM and cancer. The LAM-pulmonary function shared genetics identified four pleiotropic genes previously recognised in LAM single-cell transcriptomes: ADAM12, BNC2, NR2F2 and SP5. We had previously associated NR2F2 variants with LAM, and we identified its functional partner NR3C1 as another pleotropic factor. NR3C1 expression was confirmed in LAM lung lesions. Another candidate pleiotropic factor, CNTN2, was found more abundant in plasma of LAM patients than that of healthy women.
Conclusions This study suggests the existence of a common genetic aetiology between LAM and pulmonary function.
Abstract
Pleiotropic molecular factors influencing LAM development and pulmonary function https://bit.ly/3AE3KgJ
Introduction
Lymphangioleiomyomatosis (LAM) is a rare low-grade progressive neoplasm that affects women almost exclusively and is characterised by cystic lung destruction, which can lead to respiratory failure in severe cases [1, 2]. In addition to cystic lung disease, LAM is also strongly associated with renal angiomyolipomas (AML) and lymphatic alterations, for which reason it may be properly considered a systemic disorder [1, 2]. LAM lung lesions are characterised by low-grade proliferation of “LAM cells” that have both smooth muscle cell-like features, and microphthalmia-associated transcription factor (MITF)-driven gene expression; and by cyst formation, which is likely driven by expression of proteases and cathepsins [3–5]. The tissue of origin of LAM cells is uncertain [6–8].
LAM can occur sporadically (S-LAM) or in the presence of tuberous sclerosis complex (TSC-LAM), an autosomal-dominant multisystem disorder caused by heterozygous germline or mosaic loss-of-function mutations in the tumour suppressor genes TSC1 and TSC2 [9, 10]. In S-LAM, somatic inactivation of TSC2, or much less commonly TSC1, in an unknown cell type(s) appears to be sufficient for disease development [11]. Germline or somatic mutations in TSC1/TSC2 lead to hyperactivation of the mechanistic target of rapamycin complex 1 (mTORC1), and mTOR allosteric inhibitors (rapamycin/sirolimus and its derivates, rapalogs) are the main therapeutic approach for LAM [12]. Owing to its central role in metabolism, mTORC1 activity is abnormally enhanced in many cancer types, and generally linked to stem cell-like features, which are also present in LAM cells [13–16]. Indeed, LAM shows several fundamental hallmarks of cancer, including continued cell proliferation and resistance to cell death, expression of factors promoting tissue invasion and metastasis, and immune evasion [6, 17–19].
A recent genome-wide association study (GWAS) we performed identified several common genetic variants on chromosome 15q26.2 for which an allele was associated with risk of S-LAM [20]. Although not certain, these alleles appear to affect LAM development through effects on the nuclear receptor subfamily 2 group F member 2 (NR2F2) gene, also known as chicken ovalbumin upstream promoter transcription factor II (COUP-TFII). This transcription factor is widely expressed during embryogenesis and has a role in various endocrine conditions and cancers [21, 22]. Here we sought to explore in further detail the genetic basis of LAM risk, considering the hypothesis that there is a shared genetic basis between LAM and cancer and/or pulmonary function.
Materials and methods
GWAS data
The GWAS summary statistics of LAM [20], 17 cancer types [23] and pulmonary function tests (including of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), FEV1/FVC ratio and peak expiratory flow (PEF)) [24] were obtained from the corresponding data sources (supplementary Table S1 provides details of the GWASs, including sample sizes and relevant references). This study did not require individual data. The original LAM GWAS included 426 S-LAM subjects that were analysed in comparison with 852 females from the COPDGene study in a matched case–control design [20].
Imputation and data preprocessing
The GWAS summary statistics for LAM were imputed using the SSimp software [25, 26] and European individuals from the 1000 Genomes Project phase 3 reference panel, filtering out single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) ≤0.01. The imputation increased the number of SNPs from 681 894 to 9 325 933, but those with poor imputation quality (r2<0.3) were removed, providing a set of 7 809 072 SNPs for subsequent analyses. For each of the variant-summary statistics, standard quality controls were applied: removal of SNPs without reference identifier (rs ID), duplicated, poorly imputed (info score <0.9), with MAF≤0.01, with strand ambiguous alleles and/or with sample size 5 standard deviations away from the mean. In addition, SNPs in the extended major histocompatibility complex (hg 19 coordinates, chr6: 25 119 106–33 854 733) and 8p23.1 region (chr8: 7 200 000–12 500 000) were excluded given that these regions have complex linkage disequilibrium that could bias pleiotropy analyses [27].
Shared genetic architecture
The estimation of heritability of all phenotypes and computation of genetic correlation between GWASs of LAM and each of the other selected diseases or traits were performed using the high-definition likelihood (HDL) inference method [28]. Genetic correlations estimated using HDL fully account for linkage disequilibrium across the genome, increasing the power to detect connections between complex traits using precomputed linkage disequilibrium matrices for 335 265 Genomic British individuals in the UK Biobank and HapMap3 SNPs [28]. For detection of shared genetic risk factors, we used the pleiotropy-informed conditional false discovery rate approach (FDR) [29] applying pleioFDR software (https://github.com/precimed/pleiofdr/) and computing conjunctional false discovery rate (conjFDR) statistics. The conjFDR is given by the maximum between the conditional FDRs (condFDR) for two given conditions; the condFDR method was shown to improve statistical power relative to the conventional approach of using p-value thresholds for detection of shared genetics and was not affected by the direction of the allele effects [30, 31]. A pairwise analysis was performed between LAM GWAS and each other condition. In order to make the results comparable, we analysed a common set of 5 684 891 SNPs present in all summary statistics. Shared genetic variants were defined by conjFDR<0.05. We then performed linkage disequilibrium clumping to define independent significant SNPs (PLINK software, p1=0.05; p2=1, linkage disequilibrium threshold r2=0.6, and physical distance threshold for clumping 1000 kb) and lead SNPs (PLINK software, p1=0.05, p2=1, r2=0.1, and distance 1000 kb). Genomic risk loci were found by merging lead SNPs if they were closer than 250 kb. Candidate SNPs were mapped to independent significant SNPs using this clumping strategy. Stratified Q-Q plots were obtained using pleioFDR to visualise shared genetic architecture. In these representations, the p-values of the primary phenotype were plotted against the null distribution. In the same plots, we represented subsets of SNPs of the primary phenotype conditioned by the significance of their association with the secondary phenotype; p-value of the secondary phenotype <0.1, 0.01 and 0.001. A leftwards deflection of the lines implies the presence of shared genetic architecture.
Mendelian randomisation
A two-sample Mendelian randomisation (MR) approach was applied to evaluate causality using GWAS summary statistics of cancer risk and spirometry measures as exposure or outcome, and LAM as the outcome or exposure, respectively. A valid instrumental variable should fulfil three core assumptions: the variant is associated with the exposure; the variant is independent of all confounders of the exposure–outcome association; and the variant is only associated with the outcome through its effect on the exposure [32]. Independent genetic variants (linkage disequilibrium r2>0.001; genome-wide significance, p<5×10−8, for spirometry measures; and at a suggestive level, p<1×10−5 for cancer and LAM risk) were used as instrumental variables. The three-step MR-Egger method [33] was applied: 1) a test for directional pleiotropy; 2) a test for a causal effect; and 3) estimation of the causal effect. To assess the robustness of the results, causal estimates and p-values were compared using random-effects inverse-variance-weighted and robust adjusted profile score (MR-RAPS) [34] methods. Heterogeneity was calculated from Cochran's Q value [35]. The MR Steiger directionality test, which compares the variance explained by the SNPs in the exposure and outcome, was applied to elucidate the direction of causality [36]. This test estimates directionality leveraging the fact that in most settings the genetic variants will explain more variance of the trait located upstream in the causal chain. Informative scatter and forest plots were generated to examine the results. The analyses were performed using the TwoSampleMR R package [37].
Gene candidate prioritisation
Information of positional gene candidates (up to 1 megabase from the given variant) was integrated with data of expression quantitative trait loci (eQTL) in cis identified in non-diseased human tissue (Genotype-Tissue Expression (GTEx) project [38]), and with data of chromatin interactions identified in lung tissue as potential cis-regulatory elements [39]. The GTEx evidence was obtained from pan-tissue analyses as well as from tissue expected to be biologically related to LAM disease: blood vessel, lung, kidney and uterus. Information for physical protein interactions and complex membership was taken from the Human Reference Protein Interactome [40] and Biogrid [41] databases. For the evaluation of publicly available single-cell RNA-sequencing (RNA-seq) profiles from LAM1-4 diseased lungs (Gene Expression Omnibus reference GSE135851), the samples were pre-processed and analysed as originally described [8], using the Seurat R package [42]. LAM cells (excluding lung mesenchymal cells) were identified using the LAMcore gene expression signature originally described by the authors as the reference [8]. RNA-seq data of kidney AMLs corresponded to 14 published cases from two studies (n=5 [20]; and n=9 [43]) and 14 unpublished cases (internal cohort, unpublished data). RNA-seq data of other cancer types corresponded to The Cancer Genome Atlas (TCGA) [44] and were downloaded from the cBioPortal for Cancer Genomics [45]. The single-cell RNA-seq data of the Human Lung Cell Atlas [46] was downloaded from the Synapse portal (https://www.synapse.org/#!Synapse:syn21041850/wiki/600865). For all candidate pleiotropic genes, and for each lung cell type, a percentage of cells with a given gene expression was calculated using the ComplexHeatmap R package with standard parameters [47].
LAM lung samples, immunohistochemistry
LAM patients were recruited and lung tissue samples collected by the participating centres (International LAM Clinic, University Hospital Vall d'Hebron; University Hospital La Princesa; University Hospital Clínica Puerta del Hierro; University Hospital Clínic Barcelona; University Hospital Virgen del Rocío; and University Hospital of Bellvitge) and with the support of the Spanish LAM Association (AELAM). The patients provided written informed consent, and the study was approved by the Research Ethics Committee of the IDIBELL. The immunohistochemical assays were performed on serial paraffin sections using an EnVision kit (Dako) and antigen retrieved with citrate pH 6.0 buffer. Endogenous peroxidase was blocked by pre-incubation in a solution of 3% H2O2, performed in 1× phosphate-buffered saline with 10% goat serum. Slides were incubated overnight at 4°C with primary antibodies (anti-NR3C1, dilution 1:100, D6H2L, Cell Signalling Technology, Danvers, MA, USA; and anti-NTN4, dilution 1:30, HPA049832, Sigma-Aldrich, St. Louis, MO, USA) in blocking solution. Secondary peroxidase-conjugated antibody (Envision+ system-HRP, Dako) was used. Sections were counterstained with haematoxylin and examined under a Nikon Eclipse 80i microscope. For immunohistofluorescence, the slides were incubated with a mixture of the two primary antibodies (anti-αSMA, dilution 1:1000, A2547, Sigma-Aldrich; and anti-NR3C1). The secondary antibodies used were goat anti-mouse Alexa-546 and goat anti-rabbit Alexa-488 (dilution 1:100; Thermo Fisher, Waltham, MA, USA). Sudan black staining was performed to avoid paraffin autofluorescence. The sections were mounted with coverslips in Vectashield containing DAPI and visualised under a Nikon 80i epifluorescence microscope equipped with a DS-Ri1 camera.
Plasma samples and ELISA
LAM blood samples were collected and immediately processed during the 2017 and 2018 annual AELAM patient conferences, so the time between undertaking pulmonary function tests and sample acquisition varied, making it impossible to assess the former relative to the biomarkers. The data collected consisted of age at diagnosis, age at sample extraction, diagnosis of AML, chylothorax, pneumothorax, TSC and therapy used at the time of sample extraction. All patients provided written informed consent and the study was approved by the ethics committees of IDIBELL and the Instituto de Investigación Sanitaria La Princesa, Hospital de Henares. Control samples were obtained from healthy premenopausal women from a similar age distribution to that of the LAM patients. Lung disease-related patients corresponded to cases diagnosed with emphysema (excluding COPD; University Hospital of Bellvitge, IDIBELL), Langerhans cell histiocytosis, Sjögren syndrome and systemic lupus erythematosus. Blood samples of the former conditions were collected by the ILD Centre of Excellence, St. Antonius Hospital Biobank (Nieuwegein, The Netherlands). This study was approved by the institutional ethics committee (reference R05-08A), and all participants provided written informed consent. CNTN2 was quantified in blood plasma using the Human Contactin-2/TAG1 DuoSet enzyme-linked immunosorbent assay (ELISA; DY1714-05, R&D Systems, Minneapolis, MN, USA) following the manufacturer's instructions. Vascular endothelial growth factor-D (VEGF-D) levels were measured using a commercially available ELISA kit according to the manufacturer's protocol (DVED00, R&D Systems).
Results
Identification of shared genetics of LAM and cancers
As a low-grade neoplasm, LAM has a biological similarity to cancer [6, 17–19]. This connection might be extended to disease susceptibility [48, 49] and could indicate the existence of a shared genetic basis of the two conditions. To assess this hypothesis, we analysed summary statistic results from the original S-LAM GWAS [20] and from studies of 17 cancer types, including those of breast, colon, glioma, kidney, lung, neuroblastoma, ovary, pancreas, prostate, stomach and skin (supplementary Table S1). After SNP imputation, data preprocessing and quality control analyses, the summary statistics of 5 684 891 SNPs were evaluated between LAM and cancers. The overall pairwise genetic correlation between LAM and cancer risk did not reach significance for any cancer; the strongest correlation was negative and for cervical cancer risk (r=−0.32, p=0.083; supplementary Table S2). In addition, when conditioning LAM on any cancer type, there were no substantial signs of shared genetic architecture in the stratified Q-Q plots (supplementary Figure S1). However, for some cancer analyses, there was not enough statistical power to estimate genetic correlations accurately (supplementary Table S2).
Despite the non-significant overall LAM-cancer genetic correlation, seven shared genetic associations were identified with conjFDR<0.05, including variants between LAM and gastric, kidney or prostate cancer risk (table 1). These variants were mapped to six genomic loci, of which three were linked to gastric cancer, two to kidney cancer and one to prostate cancer risk. Four of the associated loci (including gastric, kidney and prostate cancer) were predicted to function as agonists; that is, the corresponding minor alleles showed the same direction of effect for LAM and cancer risk; in turn, two were predicted to act as antagonists (including gastric and kidney cancer; table 1).
Shared genetic variants between lymphangioleiomyomatosis (LAM) and cancer risk (ordered by cancer type)
The chromosome 6q24 rs3861451 variant shared between LAM and gastric cancer risk was relatively linked (Caucasian D´=0.74 and r2=0.52) with a previously noted SNP association for this cancer type, rs618688 [23]. Next, we inspected the seven identified LAM-cancer shared variants (table 1) relative to the GWAS catalogue for human traits and diseases [50, 51]. This examination identified the chromosome 2q31 variant rs4668267 that connects LAM and prostate cancer risk as a genome-wide association signal of earlobe attachment [52].
Identification of shared genetics of LAM and pulmonary function
In parallel with cancer, we analysed shared genetic factors between LAM and pulmonary function, determined by spirometry measures of FEV1, FVC, FEV1/FVC ratio and PEF in the general population using summary statistic GWAS results from the UK Biobank and SpiroMeta consortium [24]. The overall pairwise genetic correlations between LAM and FVC, and PEF, were nominally significant and negative: r=−0.074, 95% CI −0.04– −0.10, p=0.013; and r=−0.10, 95% CI −0.05– −0.15, p=0.029, respectively (supplementary Table S2). The LAM correlation with FEV1 was also negative but did not reach significance: r=−0.061, p=0.097. By conditioning LAM on spirometry measures, the stratified Q-Q plots showed a leftwards deflection with increasingly strong associations with the depicted measures, such as PEF (figure 1; Q-Q plots for the rest of measures are shown in supplementary Figure S2).
Stratified Q-Q plots including lymphangioleiomyomatosis (LAM) and peak expiratory flow (PEF) genome-wide association study (GWAS) results. a) Q-Q plot (nominal versus empirical −log10 p-values, corrected for inflation) conditioning LAM on PEF; and b) Q-Q plot conditioning PEF on LAM. Leftwards deflection from the null distribution of the observed p-value as the thresholds become more stringent indicates genetic overlap between the two traits. SNP: single nucleotide polymorphism.
22 variants in 11 loci were common to LAM and pulmonary function: 11 shared with FEV1, seven with FVC, four with FEV1/FVC ratio and six with PEF (table 2). Six of the identified variants mapped to chromosome 15q26.2, relatively close (≤35 kb) to the primary S-LAM associations targeting NR2F2 [20]; the six identified variants were in relative linkage disequilibrium (Caucasian D’>0.90, r2>0.25) with the original genome-wide significant associations detected by rs2006950 and rs4544201 [20]. Inspection of the UK Biobank and SpiroMeta results showed nominal associations between these two original SNPs and pulmonary function measures, with p-values between 0.05 and 0.0002. Interestingly, all six variants identified in 15q26.2 had opposite effects in pulmonary function compared with LAM risk: that is, their minor alleles were associated with relatively superior pulmonary function, but with lower LAM risk (table 2).
Shared genetic variants between lymphangioleiomyomatosis (LAM) and pulmonary function (ordered by chromosome)
There was no overlap between the shared variants influencing LAM and cancer, and LAM and pulmonary function. We further inspected the 22 LAM-pulmonary function shared variants relative to cancer risk associations [50, 51] in the corresponding genomic regions. The chromosome 5q31.3 rs7701443 variant is located at 37 kb (Caucasian D’=0.90, r2=0.15) of a breast cancer association identified by the Breast Cancer Association Consortium, rs2963155 [53], and the chromosome 9p22.3 variant rs4961722 is located at <1 kb (Caucasian D’=1.0, r2=0.03) from a skin cancer association, rs10962474 [54]. In addition, the chromosome 8q23.3 region comprising five LAM-pulmonary function shared variants (table 2) was relatively strongly linked (Caucasian linkage disequilibrium D’=0.25–1; maximum distance between variants ≤45 kb) to association signals of colorectal, gastric and rectal cancer risk: rs6469654, rs6469656, rs16892766, rs76316943, rs117079142 and rs200235517 [55].
Next, we inspected the identified LAM-pulmonary function loci for associations with lung-related traits. The LAM-FEV1 chromosome 2p21 rs13410076 variant is located at 230 kb from a rare variant (Caucasian MAF <0.001) previously associated at the genome-wide level with oxygenated haemoglobin levels in Tibetan women living at high altitude, rs372272284, which may influence the expression of endothelial PAS domain-containing protein 1 (EPAS1/HIF2A) gene [56]. Evaluation of other traits identified the LAM-FEV1/FVC chromosome 1q32.1 rs16937 and rs11240341 variants as genome-wide association signals for mean platelet volume [57] and schizophrenia [58].
Prioritisation of gene candidates
To evaluate potential pleiotropic genes, we examined eQTL data from non-diseased human tissue [38], chromatin interactions identified in lung tissue [39], and protein physical and complex membership interactions [40, 41]. Of the identified loci, the 5q31.3 LAM-pulmonary function shared variant rs7701443 maps in the first intron of the nuclear receptor subfamily 3 group C member 1 (NR3C1), and it is an eQTL for this gene (supplementary Table S3). NR3C1, also known as glucocorticoid receptor, physically interacts with NR2F2, and this relationship influences the transcriptional regulation of defined gene targets [59]. In addition, NR3C1 positively regulates the expression of another potential pleiotropic factor, EPAS1, in breast cancer cells under hypoxic conditions [60]. Two other gene candidates, insulin-like growth factor-binding protein 3 (IGFBP3), defined by LAM-FEV1 shared variant rs9648555, and disintegrin and metalloproteinase domain-containing protein 12 (ADAM12), defined by LAM-gastric cancer shared variant rs10901587, code for proteins that physically interact: ADAM12 cleaves IGFBP3, and this biochemical modification regulates insulin-like growth factor (IGF) activity in regenerating and developing tissue, including cancer and pregnancy settings [61]. The variants shared by LAM and pulmonary function detected in chromosome 8q23.3 might target the eukaryotic translation initiation factor 3 subunit H (EIF3H) gene, as proposed for the investigated associations with colorectal cancer risk [62].
Expression of candidate genes in LAM cells
A recent landmark study has depicted LAM single-cell gene expression profiles from four patients [8]. This work defined a LAMcore expression signature that included NR2F2 [8], as well as three of the candidates identified in our study: ADAM12, basonuclin 2 (BNC2) specificity protein 5 transcription factor (SP5) (supplementary Table S3). The identification of four LAMcore genes among 48 pleiotropic gene candidates (supplementary Table S3) was a higher proportion than expected by chance (hypergeometric test of overlap p=0.039). Examination of a second LAM single-cell study that analysed one tissue sample expanded the list of candidates to IGFBP3 [7]. Then, differential expression analysis between LAM and non-LAM cells using the first dataset [8] identified several gene candidates frequently detected and overexpressed in LAM cells. In addition to NR2F2, this analysis identified EIF3H, IGFBP3 and NR3C1 as being linked to LAM cell profiles (supplementary Table S4). In turn, EPAS1 was predicted to be underexpressed in LAM cells (supplementary Table S4).
Among the potential pleiotropic genes, we first evaluated NR3C1 because its product has an established functional relationship with the primary LAM GWAS candidate, NR2F2 [59], and because of the key role played by glucocorticoids in breast cancer development and metastasis [63–65]. Using RNA-seq data of 28 kidney AMLs [20, 43], a similar tumour entity to LAM, and of a large collection of 27 human cancer types [44], the expression level of NR3C1 ranked relatively high in AMLs, just behind acute myeloid leukaemia and kidney renal clear cell carcinoma (figure 2a). In addition, we found a significant (Fishers’ exact test p=5.3×10−10) gene overlap between an experimentally defined NR3C1-activity signature [66] and genes differentially expressed in LAM single cells [8] (figure 2b). Then, immunohistochemistry assays in lung tissue from seven S-LAM patients confirmed substantial NR3C1 expression in all of them, with nuclear positivity in epithelioid and spindle-like diseased cells (figure 2c). Co-staining with the LAM marker α-smooth muscle actin (αSMA) showed cellular colocalisation with NR3C1 expression in lung nodules, although non-αSMA lung cells also presented nuclear expression of NR3C1 (figure 2d).
NR3C1 gene expression in angiomyolipoma (AML) tumours and NR3C1 protein expression in lymphangioleiomyomatosis (LAM) lung lesions. a) Comparison of NR3C1 expression in kidney AMLs (red font) with other neoplasms (TCGA data: 2463 tumours of 27 histological types). Cancer abbreviations: LAML: acute myeloid leukaemia; KIRC: kidney renal clear cell carcinoma; SARC: sarcoma; LUAD: lung adenocarcinoma; LGG: low-grade glioma; LIHC: liver hepatocellular carcinoma; HNSC: head and neck squamous cell carcinoma; GBM: glioblastoma multiforme; LUSC: lung squamous cell carcinoma; PAAD: pancreatic adenocarcinoma; THCA: thyroid carcinoma; PCPG: phaeochromocytoma and paraganglioma; DLBC: lymphoid neoplasm diffuse large B-cell lymphoma; ACC: adrenocortical carcinoma; KIRP: kidney renal papillary cell carcinoma; SKCM: skin cutaneous melanoma; CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma; KICH: kidney chromophobe; PRAD: prostate adenocarcinoma; BRCA: breast invasive carcinoma; MESO: mesothelioma; OV: ovarian serous cystadenocarcinoma; UCS: uterine carcinosarcoma; BLCA: bladder urothelial carcinoma; COAD: colon adenocarcinoma; UCEC: uterine corpus endometrial carcinoma; READ: rectum adenocarcinoma. The numbers in parentheses are the sample sizes of the indicated cancer types. The median, interquartile range and 95% range are shown for each setting, with outliers indicated by circles. Gene expression is shown as RSEM (RNA sequencing by expectation maximisation) values. b) Venn diagram showing the identity overlap (n=27) between genes identified in the NR3C1-activity signature and genes differentially expressed in LAM single cells. c) Representative images from immunohistochemistry assays for the detection of NR3C1 expression in LAM lung lesions of three patients (LAM #1–3). The arrows indicate the magnified lesion areas in the insets: in magnified lung nodules, epithelioid and spindle-like diseased cells are apparent from the observed nuclear shapes of positive NR3C1 staining. The positive control results from colon tissue are also shown, as well as those from normal lung tissue showing positivity in the alveolar epithelium, and luminal and basal layers of the bronchioles. d) Representative images of immunohistofluorescence detection and colocalisation of NR3C1 and αSMA in LAM lung lesions; nuclei stained blue with DAPI (merged). Lung nodules of three LAM patients are shown.
Other candidate genes emerged from the exploration of eQTLs and genomic regulatory signals (supplementary Table S3), in combination with the suggestive evidence from other neoplasms. The netrin-4 (NTN4) gene—chromosome 12q22 LAM-pulmonary function shared associations (table 2)—was identified as being the target of a breast cancer risk association [67], and its product influences metastatic potential and angiogenesis [68, 69]. Immunohistochemistry assays did not reveal NTN4 expression in LAM lung lesions, although the normal alveolar layer was positive (figure 3a). Of the other candidates (supplementary Table S3), the contactin-2 (CNTN2) gene was prioritised based on criteria similar to those for NTN4 (supplementary Table S3), and its product is linked to inflammatory conditions [70] and interacts with oestrogen receptor α [71]. Since CNTN2 can be detected in body fluids [72], we measured its levels in the plasma of LAM patients and compared the results with those of healthy women and patients with related pulmonary diseases. Using ELISA assays, we identified significant overabundance of CNTN2 in LAM plasma relative to healthy women, and to Langerhans cell histiocytosis and Sjögren syndrome patients; however, we found no differences with respect to emphysema and systemic lupus erythematosus patients (figure 3b). A comparison between LAM patients receiving and not receiving rapamycin treatment, and with low and high VEGF-D plasma levels (threshold of 800 pg·mL−1) revealed no significant differences (figure 3c). Subsequent examination of single-cell transcriptomic data of the human lung [46] identified CNTN2 expression exclusively in vascular smooth muscle cells (supplementary Figure S3). These cells also featured relatively high levels of expression of NR2F2 and NTN4; in contrast, NR3C1 was expressed in a wider range of lung cell types (supplementary Figure S3), consistent with the previous immunohistochemistry and immunohistofluorescence results (figure 2c,d).
Evaluation of additional pleiotropic factors. a) Representative images from immunohistochemistry assays for detecting NTN4 expression in lymphangioleiomyomatosis (LAM) lung lesions. Lung nodules appear negative, while the alveolar layer is positive. The positive control of kidney tissue is shown. b) Overabundance of CNTN2 in LAM plasma relative to healthy women and two related pulmonary diseases, as indicated. The number of samples analysed in each setting (n) is indicated. Asterisks indicate significant differences based on two-sided Mann–Whitney tests (*p<0.05, ***p<0.001 and ****p<0.0001). Mean values are indicated with horizontal black lines. c) Absence of significant differences (n.s.: not significant) of CNTN2 plasma levels between LAM patients receiving and not receiving rapamycin treatment (left panel), and between LAM patients with high and low vascular endothelial growth factor-D (VEGF-D) plasma levels (right panel).
Evaluation of LAM causality
Following on the identified shared genetics and pleiotropic factors, we used MR methods [73] to evaluate causality. Analysis of LAM as outcome did not show significant associations with cancer but suggested an association with FEV1; this was supported by 245 genetic variants that depicted a LAM-FEV1 negative correlation, as observed above, with no significant heterogeneity (supplementary Table S5, and Figures S4 and S5). However, a directionality test of causality showed inconsistent results with respect to the assumption that variance explained by the exposure should be greater than that of the outcome [36] (supplementary Table S5). Then, evaluation of LAM as exposure suggested negative correlations with bladder and endometrial cancer risk, and a positive correlation with FEV1; in these comparisons, the directionality assumption was fulfilled, but it was based on eight to nine variants (supplementary Table S5).
Discussion
This study identifies genetic variants that may concurrently influence LAM and cancer risk, or LAM and pulmonary function. Interestingly, two loci on chromosomes 4 and 9 are shared between kidney cancer risk and LAM risk, and the kidney is one potential tissue of origin for LAM [6, 8, 19, 48, 74–77]. However, the gene candidates of these loci do not emerge as being linked to LAM single-cell transcriptome profiles [7, 8]. In contrast, the genetic connection between LAM and pulmonary function proved to be more relevant in several analyses. There were indications of substantial shared genetic architecture in LAM and FVC, and PEF measures, and a trend in the same direction was observed for FEV1. In all comparisons, the LAM-pulmonary function correlation was negative, which is consistent with the expectation that a given genetic variant that associates with greater pulmonary function would have a corresponding lower risk of LAM, and vice versa. It is of note that the data of pulmonary function came from a population-based study [24]. In addition, the four pleiotropic gene candidates previously identified in a LAM single-cell signature (ADAM12, BNC2, NR2F2 and SP5) [8] represented a higher proportion than would be expected by chance. However, the results of MR were not conclusive of a specific causal direction, likely because of the limited power and relative high variance of the LAM GWAS. These were also restrictions when examining genetic correlations, even if the HDL method [28] was used. There was an indication of LAM being causal of reduced FEV1, which was expected, but a causal effect in the opposite direction remains unclear.
The biological function of some of the identified pleiotropic candidates appears consistent with LAM pathogenesis. NR3C1 (glucocorticoid receptor) can stimulate the transcriptional activity of NR2F2, which is the expected target of the primary LAM GWAS signal [20]. The activity of both NR3C1 and NR2F2 interacts with hormone signalling in health and disease [78, 79], and this might be connected to the role of oestrogens in LAM pathogenesis [80]. In addition, NR3C1 has been associated with breast cancer development and metastasis [63, 65], and may influence epithelial-to-mesenchymal transition, cell adhesion and tissue inflammation [53, 63, 81, 82]. The confirmation of NR3C1 expression in LAM lung lesions and diseased cells is further evidence of a role for this factor in disease development, as well as in pulmonary function. The indication of increased NR3C1 activity in single-cell LAM transcriptome profiles might anticipate a therapeutic benefit from specific NR3C1 antagonists, as proposed for other hormonal cancers [83]. It is of particular note that mifepristone, an antagonist of progesterone and glucocorticoid receptors yielded a preclinical benefit by inhibiting LAM tumorigenesis [84].
Other gene candidates were evaluated based on eQTL and genomic regulatory evidence. In a similar way to NR3C1, the identification of another gene linked to breast cancer risk, NTN4 [67], is particularly intriguing. These observations might in turn be connected to our previous results of a potential higher incidence of breast cancer diagnoses among LAM patients [48, 49]. We did not detect NTN4 expression in LAM-diseased cells, although a role for NTN4 influencing the tissue micro-environment cannot be excluded. Examining the other candidates, we found an overabundance of CNTN2 in LAM plasma, which, in addition to its potential for revealing information about disease risk, raises the possibility of establishing an independent biomarker of VEGF-D. However, further analyses using samples collected at different times during the disease history are required to confirm these indications. Other pleiotropic gene candidates identified in this study might also be linked to LAM pathogenesis: IGF signalling and IGFBP2 function have been associated with LAM progression [85], though the expression of IGFBP3 in lung lesions is unclear [86]; and LAM lung lesions show expression of matrix metallopeptidases [87], which also promote disease progression [88, 89]. The LAM-FEV1/FVC rs16937 and rs11240341 shared variants – which may target the transmembrane protein 81 (TMEM81) and/or retinoblastoma binding protein 5 (RBBP5) genes [90] – were previously associated with platelet measures in the general population [57] and alteration of this blood component is associated with several inflammatory lung diseases [91].
Collectively, this study proposes a common aetiology between LAM and pulmonary function. This connection may be due to genes whose function is particularly relevant in the cell(s) of origin of LAM as well as lung tissue development and/or may indicate a cell origin of LAM that resides in the lung cell populations. The latter hypothesis appears to be consistent with the recent demonstration that Tsc2 loss in the lung mesenchymal lineage causes LAM-like disease in mice [7]. Thus, LAM might be an extreme phenotype of reduced lung function due to abnormal mTORC1-driven proliferation of mesenchymal-like cells. Particularly, NR3C1-glucocorticoid signalling regulates differentiation of proliferative mesenchymal progenitors into matrix fibroblasts [92], and these cells endorse synthesis of extracellular matrix and collagen during early lung development. The concept of altered mesenchymal cell differentiation and subsequent accumulation of extracellular matrix components would be consistent with the evolution of LAM lung pathology [93] and with identified pathway expression correlations with LAMcore [94]. Intriguingly, pleural mesothelioma cell lines have transcriptome profiles relatively similar to those of LAM cells [94]. In addition, the regulatory function of NR3C1 and NR2F2 may be coordinated during lung development [92]. Interestingly, GWAS signals of lung function were strongly predictive of COPD [24], which also shows extracellular matrix alterations and shares lymphatic vascular remodelling with LAM [95]. Several of the lung function signals were also found to be associated with other respiratory traits, including asthma [24], which is frequently diagnosed in LAM women, and with an inflammatory multisystem disease predominantly affecting women, systemic lupus erythematosus, which can be presented with cystic lung disease [1, 2]. Moreover, of the original 279 GWAS lung function signals [24], there were predicted 16 gene candidates that code for interactors of NR3C1, including the co-repressor NCOR1 [96] (supplementary Table S6).
The results of this study identify genetic factors and their molecular targets that may influence LAM development. However, as noted above, our study was limited by the relatively small sample size of the LAM GWAS, whose imputed summary statistics could also add noise to the results. Genetic studies of larger LAM cohorts are necessary to corroborate the findings and conclusively determine pleiotropy or causality with respect to pulmonary function and/or cancers. Likewise, studies of pleiotropic gene candidates may be warranted to better comprehend LAM aetiology and origin.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Figure S1 00375-2021.figureS1
Figure S2 00375-2021.figureS2
Figure S3 00375-2021.figureS3
Figure S4 00375-2021.figureS4
Figure S5 00375-2021.figureS5
Table S1 00375-2021.tableS1
Table S2 00375-2021.tableS2
Table S3 00375-2021.tableS3
Table S4 00375-2021.tableS4
Table S5 00375-2021.tableS5
Table S6 00375-2021.tableS6
Acknowledgements
We wish to thank all the participants of the lymphangioleiomyomatosis (LAM) genome-wide association study (GWAS) for their contribution. We also thank all the investigators who were involved in the other GWASs analysed in this study for making their results available to us.
Footnotes
Provenance: Submitted article, peer reviewed.
This article has supplementary material available from openres.ersjournals.com
Conflict of interest: X. Farré has nothing to disclose.
Conflict of interest: R. Espín has nothing to disclose.
Conflict of interest: A. Baiges has nothing to disclose.
Conflict of interest: E. Blommaert has nothing to disclose.
Conflict of interest: W. Kim has nothing to disclose.
Conflict of interest: K. Giannikou has nothing to disclose.
Conflict of interest: C. Herranz has nothing to disclose.
Conflict of interest: A. Román has nothing to disclose.
Conflict of interest: B. Sáez has nothing to disclose.
Conflict of interest: Á. Casanova has nothing to disclose.
Conflict of interest: J. Ancochea has nothing to disclose.
Conflict of interest: C. Valenzuela has nothing to disclose.
Conflict of interest: P. Ussetti has nothing to disclose.
Conflict of interest: R. Laporta has nothing to disclose.
Conflict of interest: J.A. Rodríguez-Portal has nothing to disclose.
Conflict of interest: C.H.M. van Moorsel has nothing to disclose.
Conflict of interest: J.J. van der Vis has nothing to disclose.
Conflict of interest: M.J.R. Quanjel has nothing to disclose.
Conflict of interest: M. Tena-Garitaonaindia has nothing to disclose.
Conflict of interest: F. Sánchez de Medina has nothing to disclose.
Conflict of interest: F. Mateo has nothing to disclose.
Conflict of interest: M. Molina-Molina has nothing to disclose.
Conflict of interest: S. Won has nothing to disclose.
Conflict of interest: D.J. Kwiatkowski has nothing to disclose.
Conflict of interest: R. de Cid has nothing to disclose.
Conflict of interest: M.A. Pujana has nothing to disclose.
Support statement: This research was supported by Asociación Española de LAM; The LAM Foundation Seed Grant 2019; Carlos III Institute of Health grants PI18/01029, PI21/01306 and ICI19/00047 (co-funded by European Regional Development Fund (ERDF), “A way to build Europe”); Ministry of Economy and Competitivity grant SAF2017-88457-R; the Generalitat de Catalunya SGR 2017-449 and 2017-529; PERIS PFI-Salut SLT017-20-000076; and the CERCA Program to IDIBELL and Institut Germans Trias i Pujol. X. Farré is supported by the VEIS project (001-P-001647, ERDF Operational Programme of Catalonia 2014–2020; co-funded by ERDF, “A way to build Europe”). Funding information for this article has been deposited with the Crossref Funder Registry.
- Received June 2, 2021.
- Accepted September 17, 2021.
- Copyright ©The authors 2022
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org