Abstract
More DEGs are detected by RNA-Seq than microarrays in COPD lung biopsies and are associated with immunological pathways. Performing bulk tissue cell-type deconvolution in microarray lung samples, using the SVR method, reflects RNA-Seq results. https://bit.ly/2N8sY3s
To the Editor:
In the era of “big data”, microarray technology has provided researchers with the ability to measure the expression of thousands of genes in a single experiment [1]. However, array technology is limited, as it can only measure transcripts present in medium to high abundance and can only quantify genes for which oligonucleotide probes are specifically designed. RNA-Seq, the direct sequencing of RNA, is rapidly becoming more popular in analysing gene expression. RNA-Seq performs better with respect to the detection of low-abundance transcripts, identifying genetic variants and detecting more differentially expressed genes with higher fold-change [2, 3]. Bulk tissue cell-type deconvolution represents a recently developed computational method to interrogate the proportions of cell types in a sample using cell type specific gene expression references [4]. This method is mainly based on RNA-Seq data; however, little has been done to determine whether this technique can be utilised for microarray technology. We sought to investigate whether gene expression profiling in COPD bronchial biopsies, using RNA-Seq, provides additional insight into the transcriptional effects before and after inhaled corticosteroids (ICS), compared to microarrays. Furthermore, we aimed to determine whether cellular deconvolution techniques can be conducted on microarray data by using two current methods: non-negative least squares (NNLS) and support vector regression (SVR), which tries to fit the regression within a certain threshold, and comparing them to RNA-Seq data. To this end, we analysed the steroid response before and after 6 months of ICS treatment in participants with COPD. Therefore, we utilised gene expression data from bronchial biopsies, which were measured using both microarray (Affymetrix Hugene_ST1.0 array) and RNA-Seq (Illumina HiSeq 2500 platform). The bronchial biopsies were obtained from the Groningen and Leiden Universities Study of Corticosteroids in Obstructive Lung Disease (GLUCOLD) [5]. The methods of microarray sequencing in GLUCOLD have been described previously [6]. With respect to RNA-Seq, the RiboZero GOLD libraries were sequenced using 50 bp single-read sequencing. The FastQC programme (version 0.11.5; https://github.com/s-andrews/FastQC) was utilised to perform quality control checks on the raw sequence data; the sequences were then trimmed using the java programme trimmomatic 0.33 [7]. The RNA-Seq mapping was conducted using Spliced Transcripts Alignment to a Reference (STAR) version 2.5.3a [8]. Principal component analysis was performed (using R) to detect extreme outliers. After these quality checks, all samples were found to be of sufficient quality.
In 21 GLUCOLD participants, both microarrays and RNA-Seq data in bronchial biopsies were available before and after 6 months of treatment with fluticasone (ICS), with or without added salmeterol. Differential expression and cell-type composition analyses were performed to compare individual gene expression as well as single-cell (sc)RNA-Seq expression signatures. The differential expression analysis was conducted in R using the “limma” package (limma_3.30.13) for both microarray and RNA-Seq datasets while correcting for age and smoking status [9]. Differentially expressed genes (DEGs) were defined as having a fold-change (FC) ±>|1.5| and a false discovery rate (FDR)-adjusted p-value <0.05 [10]. scRNA-Seq signatures for basal, rare, ciliated and mucus-secretory cells (club and goblet cells) were utilised from our previously published data to determine differences in cell-type composition, using mRNA expression levels. scRNA-Seq data from bronchial biopsy genes were selected, which represented the unique profiles of each cell type, as explained previously [11]. Due to similar expression profiles, club cell and goblet cell scRNA-Seq signatures were combined to generate a uniform scRNA-Seq signature of mucus-secretory cells. For deconvolution, we first performed AutoGeneS to select informative genes and used two different regression methods to infer cell type proportions: NNLS and SVR [12].
By comparing genome-wide gene expression profiling in the RNA-Seq and microarray dataset, the differential expression analysis showed a stronger signal (more significant genes and higher fold-change) in the RNA-Seq dataset (figure 1a). Our analysis of the RNA-Seq data identified four increased DEGs before and after 6 months of ICS treatment, while 56 DEGs were decreased (figure 1c). In contrast, the microarray analysis only identified one DEG increased by ICS treatment, while seven DEGs were decreased. An overlap of these two analyses showed that 87.5% of microarray DEGs were identified with RNA-Seq (figure 1b).
Fold-changes between the two datasets (figure 1d), using genes measured with both techniques, showed a high level of correlation (Pearson's r=0.6615, p-value <2.2×10–16). Importantly, the magnitude of fold-change was overall higher in the RNA-Seq compared to the microarray dataset. As an example, gene RGS13, which encodes a regulator of G-protein signalling, was found to be downregulated after ICS treatment in the RNA-Seq dataset (logFC −1.01, FDR 0.017), but not in the microarray dataset (logFC −0.34, FDR 0.08) [13]. Subsequently, we utilised g:profiler to perform functional profiling on the top 50 most significantly decreased DEGs uniquely identified in RNA-Seq [14]. Several pathways that were enriched among the most downregulated DEGs belonged to immune system pathways, such as immune response, lymphocyte activation or regulation of leukocyte activation. This indicates that RNA-Seq captures differences in transcriptional biological processes, measured in bronchial biopsies from COPD participants, before and after 6 months of ICS treatment, which are missed by microarrays. Cellular deconvolution found a significant Pearson correlation between microarray and RNA-Seq using the SVR for the three cell types: secretory (goblet and club), basal and ciliated (p<0.05; figure 1e); however, this was not found for rare cells, which cellular deconvolution techniques usually have problems with. Interestingly, no correlation was observed for the NNLS, indicating that this method gave different results depending on the platform used. The NNLS result is probably due to the way this programme deals with 0 values which are not present in microarray data. We have included references providing benchmarking of the two methods [12, 15]. Spearman correlations were then conducted to determine the relationship between cellular deconvolution conducted on microarray and RNA-Seq data.
In conclusion, the SVR method allows cellular deconvolution to be conducted in microarray samples, which reflects RNA-Seq. With respect to differential expression analysis, more DEGs were detected by RNA-Seq than microarrays, which were associated with immunological pathways, with greater fold-changes. The fold-change of 1.5 or 2 traditionally used for microarray cut-offs may have been too stringent; therefore, re-sequencing samples, previously measured by microarray, may provide valuable new insights that may otherwise be overlooked.
Acknowledgements
OMNI Biomarker Development Genentech (Margaret Neighbors, Michele A. Grimbaldeston and Gaik W. Tew) and the NHLBI LungMAP Consortium (Hananeh Aliee, Fabian J. Theis and M.C. Nawijn).
Footnotes
Conflict of interest: B. Ditz has nothing to disclose.
Conflict of interest: J.G. Boekhoudt has nothing to disclose.
Conflict of interest: H. Aliee has nothing to disclose.
Conflict of interest: F.J. Theis has nothing to disclose.
Conflict of interest: M. Nawijn reports grants from the European Commission (EU H2020 programme), GSK Ltd and Lung Foundation Netherlands during the conduct of the study.
Conflict of interest: C-A. Brandsma has nothing to disclose.
Conflict of interest: P.S. Hiemstra has nothing to disclose.
Conflict of interest: W. Timens reports personal fees from Roche Diagnostics/Ventana, Merck Sharp Dohme, Bristol-Myers-Squibb and Diaceutics outside the submitted work.
Conflict of interest: G.W. Tew is an employee of Genentech Inc., a member of the Roche Group.
Conflict of interest: M.A. Grimbaldeston is an employee of Genentech Inc., a member of the Roche Group.
Conflict of interest: M. Neighbors is a full-time employee of Genentech Inc., and holds stock and options in the Roche Group.
Conflict of interest: V. Guryev has nothing to disclose.
Conflict of interest: M. Van Den Berge has nothing to disclose.
Conflict of interest: A. Faiz has nothing to disclose.
Support statement: This study was supported by Longfonds grant 4.2.16.132JO and the Ministerie van Economische Zaken en Klimaat. Funding information for this article has been deposited with the Crossref Funder Registry.
- Received February 11, 2021.
- Accepted March 2, 2021.
- Copyright ©The authors 2021
This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0. For commercial reproduction rights and permissions contact permissions{at}ersnet.org