Chronic obstructive pulmonary disease (COPD) is a type of progressive lung disease, featured by airflow obstruction. Recently, a comprehensive analysis of the transcriptome in lung tissue of COPD patients was performed, but the heterogeneity of the sample was not seriously considered in characterizing the mechanistic dysregulation of COPD. Here, we established a new transcriptome analysis pipeline using a deconvolution process to reduce the heterogeneity and clearly identified that these transcriptome data originated from the mild or moderate stage of COPD patients. Differentially expressed or co-expressed genes in the protein interaction subnetworks were linked with mitochondrial dysfunction and the immune response, as expected. Computational protein localization prediction revealed that 19 proteins showing changes in subcellular localization were mostly related to mitochondria, suggesting that mislocalization of mitochondria-targeting proteins plays an important role in COPD pathology. Our extensive evaluation of COPD transcriptome data could provide guidelines for analyzing heterogeneous gene expression profiles and classifying potential candidate genes that are responsible for the pathogenesis of COPD.
COPD, or chronic obstructive pulmonary disease, is a type of obstructive lung disease characterized by long-term poor airflow [
Currently, more than 70% of COPD patients suffer from limited physical activity, and 50% among them can not lead a normal life [
Smoking causes about 80% to 90% of all deaths from COPD [
The prevalence of COPD is well documented. The diagnostic assessment of COPD, as proposed by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), is based on 4 multiple factors, such as the patient’s level of symptoms, the extent of airflow obstruction, spirometric abnormality, and the identification of comorbidities [
Recently, major biological and clinical discoveries have been allowed by great technical advances in next-generation sequencing techniques. Kim
Here, we established a new transcriptome analysis pipeline to remove heterogeneity and find suitable markers to clearly separate COPD from normal tissue. The removal of heterogeneity enabled us to detect emergent gene expression changes and protein interaction subnetworks that were missed in the previous study. Especially, the importance of mitochondrial proteins was revitalized through our analysis regarding co-expression relationships and changes in the subcellular localization of proteins. The analysis pipeline used in this study could be used to classify heterogeneous gene expression profiles and predict potential candidates for COPD pathogenesis.
Raw RNA-seq data from 98 male COPD and 91 normal samples were downloaded from the Gene Expression Omnibus database (GSE57148,
Unmapped reads were collapsed, such that repeatedly appearing reads were regarded as a single read. The read count of each sequence was sorted in descending order, and the top 10,000 reads were selected from individual samples. The reads corresponding to V, D, J regions of the B cell receptor (BCR) and T cell receptor (TCR) loci were selected by an immunoglobulin variable domain sequence analysis tool, called IgBlast (
To remove transcriptome heterogeneity, DeMix [
PhenomeExpress [
Protein subcellular localization was examined and predicted using the analysis scheme suggested by Liu and Hu [
In a previous study by Kim
To identify the status of COPD samples, the average expression levels of known COPD marker genes were examined (
By analyzing RNA-seq data, it was possible to measure recombination events in BCR and TCR loci. VJ recombination occurs in the primary lymphoid organs and involves the joining of the variable (V) and joining (J) chains, resulting in the variation of amino acid sequences in the antigen-binding regions of BCRs and TCRs. By using IgBlast [
A workflow, including the prediction of estimates and the deconvolution process, was set up to resolve the issue of complexity (
In order to confirm the effect of the deconvolution, expression levels of known marker genes were re-evaluated (
DEGs—80 up-regulated genes and 757 down-regulated genes in COPD—could be 10 identified by applying the following conditions: genes with absolute fold-change over 1.25 and p-values less than 0.01 (
To detect biological functions or pathways closely related to specific genes, we performed enrichment assays with DEGs (
The identification of protein interaction subnetworks using the transcriptome can provide useful information on interaction modules for specific functions. Reliable subnetworks were constructed by PhenomeExpress in combination with gene expression profiles and disease-related phenotypes [
One emergent subnetwork associated with vesicle-mediated transport might be linked to the possibility that changes in protein subcellular localization play an important role in COPD development. Protein localization was predicted, based on gene expression profiles (
Two genes were regarded as DCGPs if the absolute MIC changes between COPD and normal subjects was greater than 0.4. Under this condition, 139 up-regulated pairs and 303 down-regulated pairs in COPD could be identified (
The prediction of protein subcellular localization is exemplified in
The predicted probabilities of subcellular locations of the mitochondria-related proteins were examined (
COPD is a complex and heterogeneous disease, and thus, it is not easy to investigate the pathogenesis and diagnosis of COPD [
We confirmed that our COPD data were in the mild stage, based on the expression levels of known marker genes (
Our analysis pipeline to identify DEGs was performed considering two aspects: heterogeneity and confidence (
By reducing the heterogeneity, gene expression profiles of COPD samples could become consistent with known expression patterns of marker genes. The DEGs that were identified in our pipeline were better in distinguishing COPD from normal subjects than previously defined DEGs (
Up-regulated genes in COPD were related with to functions, such as smooth muscle cell proliferation, protein autophosphorylation, and wound healing (
Identification of protein interaction subnetworks shed light on specific functions of interaction modules related to the typical phenotypes of COPD (
One attractive subnetwork associated with vesicle-mediated transport raised the question of whether protein subcellular localization plays some role in COPD. A group of proteins with subcellular localization changes in COPD were predicted by measuring co-expression scores using information on protein interaction and subcellular localization (
Here, we used public gene expression profiles generated from COPD and normal subjects and re-evaluated the differential transcriptomes by removing sample heterogeneity. The overall data analysis revealed a group of gene expression changes that were missed in previous research. Co-expression relationships between conditions could be inferred from gene expression profiles and might be useful in classifying samples and predicting protein subcellular localization. In conclusion, COPD is a complex and heterogeneous disease. The newly identified DEGs in this study and DCGPs could partially explain COPD pathogenesis in the mild stage. We expect that our strategy of analyzing heterogeneous samples will be applicable to other systems.
Conceptualization: YMO, TYR. Data curation: SH, YMO, TYR. Formal analysis: SH, TYR. Funding acquisition: SH, TYR. Methodology: SH, TYR. Writing - original draft: SH, TYR. Writing - review & editing: SH, TYR.
No potential conflict of interest relevant to this article was reported.
This work was supported by a grant (NRF-2014M3C9A3064548 and NRF2017M3C9A6047625 to T.-Y. R.), funded by the National Research Foundation of Korea (NRF). S.H. was supported by the BK21 PLUS fellowship program (10Z20130012243), funded by the Ministry of Education, Korea.
Supplementary data including two tables and three figures can be found with this article online at
Alpha diversity of VJ combinations in IGL (A), IGH (B), TCRA (C), and TCR (D). Chronic obstructive pulmonary disease (COPD) samples exhibit slightly increased combinatorial diversity compared with normal samples. p-values were calculated after 1,000 permutations.
Beta diversity of VJ combinations in IGL (A), IGH (B), TCRA (C), and TCR (D). Lower level of similarity is observed between chronic obstructive pulmonary disease (COPD) samples compared with normal samples. Normal samples are more similar to each other than to COPD samples. p-values were calculated after 1,000 permutations.
The largest biologically relevant subnetworks determined by certain disease phenotypes and gene expression changes, including subnetworks related to biological functions, such as translational elongation, regulation of apoptosis, and immune system processes.
List of differentially expressed genes after deconvolution
List of differentially expressed pairs determined by maximal information coefficient
Heterogeneous chronic obstructive pulmonary disease (COPD) samples in the mild stage. (A) Principal component analysis plot depicting relative similarities between COPD samples (red) and normal (NOR) samples (blue) using previously identified differentially expressed genes. (B) Expression levels of COPD marker genes. Recep., receptor; S., surfactant; NOR. normal. *p < 0.01, **p < 0.0001 by student’s t-test. (C) Alpha diversity of VJ combinations in IGK. (D) Beta diversity showing an inverse relation with the compositional similarity between samples in terms of VJ combinations in IGK. p-values were calculated after 1,000 per bmutation.
Deconvolution of chronic obstructive pulmonary disease (COPD) samples increases the difference between COPD and normal (NOR) tissue. (A) Deconvolution process to identify differentially expressed genes. (B) Expression levels of COPD marker genes after deconvolution. Recep., receptor; S., surfactant; NOR. normal. *p < 0.01, **p < 0.0001 by student’s t-test.
Differentially expressed genes (DEGs) between chronic obstructive pulmonary disease (COPD) and normal tissue. (A) Scatterplot of gene expression levels. Red and blue dots represent up-regulated and down-regulated genes in COPD compared with normal tissue, respectively. (B) Principal component analysis plot depicting relative similarities between COPD samples (red) and normal samples (blue) using DEGs. (C, D) Biological functions (C) and pathways (D) highly enriched in up-regulated (red) and down-regulated (blue) genes. Individual bars demonstrate fold-changes relative to background, and lines display their p-values.
Biologically relevant subnetworks in consideration of certain disease phenotypes and gene expression changes. (A–G) Letters at the top are the most highly enriched biological functions in individual subnetworks.
Differentially expressed pairs (DEPs) between chronic obstructive pulmonary disease (COPD) and normal (NOR) samples. (A) Pipeline to predict protein sublocalization from gene expression profiles. (B) Maximal information coefficient (MIC) scores for describing coexpression changes between two genes. Red and blue dots represent up-regulated and downregulated pairs in COPD compared with NOR samples, respectively. (C) Principal component analysis plot depicting relative similarities between COPD (red) and NOR samples (blue) using DEPs. (D) Venn diagram showing overlap between differentially expressed genes (DEGs) and DEPs. DCGP, differentially co-expressed gene pairs. p-values were calculated by hypergeometric test.
Prediction of protein subcellular localization changes between chronic obstructive pulmonary disease (COPD) and normal (NOR) samples. (A) The correlation scores between NDUFA12 and other mitochondrial proteins in NOR and COPD samples. The thickness and color of the edges were determined by maximal information coefficient (MIC). (B) Genes with significant subcellular localization changes between COPD and NOR samples. (C) Heatmap demonstrating probability of changes in mitochondria-related proteins.