Analysis of H3K4me3-ChIP-Seq and RNA-Seq data to understand the putative role of miRNAs and their target genes in breast cancer cell lines

Breast cancer is one of the leading causes of cancer in women all over the world and accounts for ~25% of newly observed cancers in women. Epigenetic modifications influence differential expression of genes through non-coding RNA and play a crucial role in cancer regulation. In the present study, epigenetic regulation of gene expression by in-silico analysis of histone modifications using chromatin immunoprecipitation sequencing (ChIP-Seq) has been carried out. Histone modification data of H3K4me3 from one normal-like and four breast cancer cell lines were used to predict miRNA expression at the promoter level. Predicted miRNA promoters (based on ChIP-Seq) were used as a probe to identify gene targets. Five triple-negative breast cancer (TNBC)‒specific miRNAs (miR153-1, miR4767, miR4487, miR6720, and miR-LET7I) were identified and corresponding 13 gene targets were predicted. Eight miRNA promoter peaks were predicted to be differentially expressed in at least three breast cancer cell lines (miR4512, miR6791, miR330, miR3180-3, miR6080, miR5787, miR6733, and miR3613). A total of 44 gene targets were identified based on the 3′-untranslated regions of downregulated mRNA genes that contain putative binding targets to these eight miRNAs. These include 17 and 15 genes in luminal-A type and TNBC respectively, that have been reported to be associated with breast cancer regulation. Of the remaining 12 genes, seven (A4GALT, C2ORF74, HRCT1, ZC4H2, ZNF512, ZNF655, and ZNF608) show similar relative expression profiles in large patient samples and other breast cancer cell lines thereby giving insight into predicted role of H3K4me3 mediated gene regulation via the miRNA-mRNA axis.


Introduction
Breast cancer is one of the leading causes of death in women all over the world [1]. There are many subtypes in breast cancer identified based on the origin, hormone receptors expression, and response to treatment. There are four basic subtypes namely luminal-A, luminal-B, human epidermal growth factor receptor-2 (HER2) and triple-negative breast cancer (TNBC) [2]. Among all the subtypes, the TNBC is very aggressive and has poor prognosis compared to other subtypes and very few systemic treatment options are available other than chemotherapy [3]. Luminal-A (estrogen receptor [ER]+ and/or progesterone receptor [PR]+, HER2−) because of ER expression, has better prognosis compared to TNBC [2]. Epigenetic regulation via histone modification of microRNA (miRNA) pro-Breast cancer is one of the leading causes of cancer in women all over the world and accounts for ~25% of newly observed cancers in women. Epigenetic modifications influence differential expression of genes through non-coding RNA and play a crucial role in cancer regulation. In the present study, epigenetic regulation of gene expression by in-silico analysis of histone modifications using chromatin immunoprecipitation sequencing (ChIP-Seq) has been carried out. Histone modification data of H3K4me3 from one normal-like and four breast cancer cell lines were used to predict miRNA expression at the promoter level. Predicted miRNA promoters (based on ChIP-Seq) were used as a probe to identify gene targets. Five triple-negative breast cancer (TNBC)-specific miRNAs (miR153-1, miR4767, miR4487, miR6720, and miR-LET7I) were identified and corresponding 13 gene targets were predicted. Eight miRNA promoter peaks were predicted to be differentially expressed in at least three breast cancer cell lines (miR4512, miR6791, miR330, miR3180-3, miR6080, miR5787, miR6733, and miR3613). A total of 44 gene targets were identified based on the 3′-untranslated regions of downregulated mRNA genes that contain putative binding targets to these eight miRNAs. These include 17 and 15 genes in luminal-A type and TNBC respectively, that have been reported to be associated with breast cancer regulation. Of the remaining 12 genes, seven (A4GALT, C2ORF74, HRCT1, ZC4H2, ZNF512, ZNF655, and ZNF608) show similar relative expression profiles in large patient samples and other breast cancer cell lines thereby giving insight into predicted role of H3K4me3 mediated gene regulation via the miRNA-mRNA axis.
Keywords: breast neoplasms, ChIP-Seq, luminal-A/triple-negative, miRNA, RNA-Seq moters is known to play a crucial role in breast cancer regulation [4]. Histone modifications wrapped around genes play an important role in gene regulation by providing access to transcription factors, RNA-polymerases, and other regulatory mechanisms [5]. There are several histone post-translational modifications identified earlier with a unique regulatory function for each of them. Using chromatin immunoprecipitation followed by sequencing (ChIP-Seq), one can identify the position of targeted protein binding (transcription factors, histone modifications) regions in the genome [6].
Epigenetic gene regulation occurs in three major ways namely DNA methylation of CpG islands, histone modifications, and non-coding RNA mediated [7]. Each regulatory level has a crucial role in normal cell development and diseases such as cancer and other non-communicable diseases. Gene level histone modifications can be used to predict the status of a gene whether it is active or inactive. H3K4me3 modification at the promoter level provides information about active genes, whereas H3K27me3 and H3K9me3 modifications provide the inactive status of the gene [8]. Previous studies provided evidence for correlation of histone modifications (H3K36me3, H3K9ac, H3K27ac, and H3K4me1) with gene expression. Combinations of two or more histone modifications at gene promoter and gene body level provides more resolution to predict gene activity [8].
Non-coding (nc) RNAs (LncRNAs, long intervening/intergenic noncoding RNA, miRNA, and small interfering RNA) play a major role in gene regulation of different biological processes such as cell cycle and proliferation along with developmental and metabolic processes [9]. The most widely studied ncRNAs are miRNAs which are small ncRNA ~22nt in length, evolutionarily conserved, and have a wide regulatory role in development and diseases. They play an important role in gene regulation by targeting complementary binding sites in untranslated region (UTR) regions of gene transcripts or by targeting promoters or other miRNA or LncRNA [10]. Interestingly, miRNAs are also involved in the upregulation of specific genes via binding to their promoters in the nucleus and thereby controlling gene expression [11]. An actively transcribed miRNA is able to regulate ~100-1,000 genes by complementary binding to their targeted genes. Based on conserved base-pairing homology, it has been predicted that ~60% of human genes are targeted by miRNAs [12]. In many cancers dysregulation of miRNA causes cancer progression and drug resistance. The role of miRNA deregulation in breast cancer was first reported in 2005 wherein the role of miR548 as an oncogenic regulator in breast cancer was elaborated [13]. There are other miRNAs such as let-7, miR145, miR200, and miR497 with a definitive role in breast cancer [14].
The present study is limited to normal-like, luminal-A and TN-BC-claudin subtypes of breast cancer based on hormonal receptor expression. One each of normal subtype, ER positive subtype (Luminal-A) and ER negative (TNBC-claudin) were chosen for analysis. In the current study, ChIP-Seq data corresponding to histone modification H3K4me3 for one normal-like (MCF10A) and four breast cancer cell lines (luminal-A [MCF7, ZR751], TNBC [MB231, MB436]) were chosen to understand role of miR-NA-gene promoter regulation of miRNA-mRNA axis. An attempt has been made to map the epigenetic expression patterns (histone H3K4me3) with RNA-sequencing (RNA-Seq) expression of miR-NA targeted genes. Analysis of such combined data promises to provide insights into understanding epigenetic gene regulation (chromatin) as well as gene expression [15].

Methods
ChIP-Seq data were downloaded from Gene Expression Omnibus (GEO) for breast cancer pertaining to six cell lines, viz., normal-like (MCF10A and 76NF2V), luminal-A subtype (MCF7 and ZR751), and TNBC subtype (MB231 and MB436), each with one activation histone modification H3K4me3 and two replicates (sequenced as Illumina single-end reads) (Supplementary Tables 1, 2) [16]. RNA-Seq data for the above-mentioned cell lines with four replicates were also downloaded from GEO (Supplementary Table 3) [16]. The raw reads were checked for quality using FastQC (version 0.11.7) [17]. BWA-MEM (Burrows-Wheeler alignment-Maximal Exact Matches) version 0.7.17 was used for alignment with reference genome build hg38 [18]. Samtools (version 1.8) was used to manage replicates and for sam to bam conversion [19]. ChIP-Seq analysis was done using model-based analysis of ChIP-Seq (MACS2, version 2.1.1.20160309) [20]. Peak calling was done using narrowpeak as H3K4me3 generates narrow histone marks [21]. p-value thresholds for peak calling were set to 0.001 for all samples and all replicates (Fig. 1A) [22]. Raw ChIP-Seq data of both H3K4me3 and corresponding input sequence was used for peak calling. To identify the reproducibility within the biological replicates IDR2.0.3 tool was used with a threshold of 0.05 to obtain statistically significant peaks [23,24]. Pseudo replicate analysis was carried out to identify low reproducible replicates which satisfy the criteria of N1/N2 ≥ 2 and Np/Nt ≥ 2 (where N1 represents the number of replicate1 self-consistent peaks, and N2 represents the number of replicate two self-consistent peaks; Np represents the number of peaks consistent between pooled pseudoreplicates, and, Nt represents the number of peaks consistent between true replicates) [24]. Peaks were annotated using HOMER (v3.12) tool [25]. Peaks corresponding to miRNA promoter genes were extract- ed using in-house scripts. Cis-regulatory Element Annotation System (CEAS -0.9.9.7) was used to get statistics on ChIP enrichment for genomic features such as chromosomes, promoters, gene bodies, or exons, to infer genes that are most likely to be regulated by a binding factor [26]. For RNA-Seq analysis HISAT2 (v2.1.0) was used for alignment of raw reads [27] and feature counts (v1.5.0p1) tool was used to count reads mapped on to each gene [28]. DEseq2 (v1.24) was used for differential gene expression of subtypes [29]. Downregulated and upregulated genes were selected based on log-2fold change > 2 as per guidelines for analysis of multi-omics data [30]. Differential expression of genes for normal-like (MCF10A) vs. luminal-A (MCF7 and ZR751) and normal-like (MCF10A) vs. TNBC (MB231 and MB436) was carried out (Supplementary Fig.  1). miRNA sequences were extracted and checked for complementarity with 3′-UTRs of downregulated genes. RNAhybrid server (https:// bibiserv.cebitec.uni-bielefeld.de/rnahybrid) was used to predict miRNA-mRNA interactions with seedmatch (2-8 bp) using the helix constraint "from" and "to" parameter along with cutoff of ≤ -25 kcal/mol [31]. In the current analysis, we used stringent minimum free energy ( ≤ -25 kcal/mol, except for miR3613 ( ≤ -18 kcal/mol) due to low GC content) to predict strong putative targets with high miRNA-mRNA duplex binding stability [31]. These energy cutoffs were used based on previous studies of miR1306-ADAM10 duplex that have been experimentally validated [32]. The workflow for ChIP-Seq and RNA-Seq data integration is depicted in Fig. 1B.
Relative expressions of the gene targets identified were verified using the UALCAN (http://ualcan.path.uab.edu) and CCLE (Broad Institute Cancer Cell Line Encyclopedia, https://portals. broadinstitute.org/ccle) databases. ACTB gene was used as a con-trol for relative expression analysis using the CCLE database. The UALCAN hosts the relative expression of genes across normal versus different cancer types from the TCGA (The Cancer Genome Atlas) cancer resource associated with clinicopathological data [33]. The breast cancer cell line data (60) available in CCLE database was also incorporated into the study for validation. Kaplan-Meier (KM) plots from Human Protein Atlas were used for survival analysis (https:// www.proteinatlas.org).

ChIP-Seq analysis
All the ChIP-Seq datasets passed the quality check ( Supplementary  Fig. 2) and > 86% of reads were mapped to the reference genome for all replicates of H3K4me3 (for each cell line) used in the study. The number of peaks in the biological replicates varied from 27875 to 64652 for different cell lines (Fig. 2). Reproducibility analysis of peaks (obtained for the replicates) enabled identification of statistically significant peaks (threshold 0.05) (  3,4) which are common between biological replicates [21]. In the normal cell line (MCF10A) and luminal-A subtype (cell lines MCF7 and ZR751), 16,601 peaks (59.7%), 13,008 peaks (53.4%) and 10,158 peaks (48.1%) passed the reproducibility threshold respectively. In the triple-negative subtype (cell lines MB231 and MB436), 14,339 peaks (65.7%) and 16,549 peaks (61.2%) passed the threshold. The highest percentage of overlapping peaks between the replicates was observed in cell line MB231 (65.7%) whereas the least percentage of peaks that passed the threshold was seen in cell line ZR751 (48.1%). All cell lines except 76NF2V cell line generated reproducibility ≥ 2, which indicated that the peaks were reproducible and statistically significant. Hence, cell line 76NF2V was not used for further analysis because of the low reproducibility of replicates, N1/N2 was 3.370 (Supplementary Table 4). Chromosomal-level distribution of ChIP-peaks is available in Supplementary Fig. 5. miRNA promoter prediction analysis miRNA promoter regions were identified for each cell-line ( Table  2, Supplementary Table 5). Peaks corresponding to miRNA-gene promoters that are common and unique between normal versus cancerous cell lines were identified (normal vs. TNBC, normal vs. luminal-A, and TNBC vs. luminal-A) (Tables 3, 4, Fig. 3). The majority of the miRNAs predicted have been reported to have a role in breast cancer (Supplementary Table 6).
Cell line-specific miRNAs obtained in this study have been listed in Table 3. Few of these miRNAs have been validated previously [34]. It is interesting to note that there are no common miRNAs between both the luminal-A cell lines used in this study; however, five TNBC-specific miRNAs viz., miR153-1, miR4767, miR4487, miR6720, and miR-LET7I were exclusively found in both the TNBC cell lines. Identification of target genes belonging to TNBC-specific miRNAs was carried out (Supplementary Tables 7,8). It is to be mentioned that with the cutoff criteria for target-gene identification used in this study (refer to Methods section), no targets were found for miR153-1. Of the five miRNA promoters found to be upregulated in the TNBC cell lines, 3 miRNAs, viz., miR-153-1, miR-6720, and miR-LET7I were found to have similar relative expression in TCGA data samples. Of these, miR-LET7I was found to have higher expression in TNBC   ( Supplementary Fig. 6). Eight miRNAs obtained are found to be common across two cancer subtypes (Table 4). miR4512 was observed in all cancerous cell lines, both luminal-A (MCF7 and ZR751) and TNBC (MB231 and MB436) subtypes. miR3180-3 was observed in luminal-A (MCF7 and ZR751) and TNBC subtypes (MB231 and MB436). miR6791 and miR330 were observed to be common in three cancer cell lines, two luminal-A (MCF7 and ZR751) and one TNBC (MB231) subtypes. miR5787, miR6733, and miR3613 were observed in three cancer cell lines, two TNBC (MB231 and MB436) and luminal-A (ZR751) subtypes. miR6080 was observed to be present in three cancer cell lines, two TNBC (MB231 and MB436) and luminal-A (MCF7) subtypes. All the eight miRNAs listed above have been used for further downstream analysis to identify their putative gene targets based on mRNA expression data. Of the eight miRNA promoters found to be upregulated across breast cancer cell lines, the relative expression of three miRNAs, viz., miR-( Supplementary Fig. 7). The relative expression of the other five miRNAs in this resource was found to be insufficient to draw any conclusion.

RNA-Seq analysis
All the RNA-Seq datasets passed the quality check ( > 28) and hence were retained for further analyses (Supplementary Fig. 8). About 96% reads mapped for cell lines MCF10A, MCF7, and MB231 whereas, for cell lines ZR751 and MB436 > 93% mapping was observed (Supplementary Table 9). In normal-like vs. luminal-A type, a total of 1,189 genes were upregulated (Supplementary Table 10) and 687 genes were downregulated (Supplementary Table 11). In normal-like vs. TNBC type, a total of 954 genes were upregulated (Supplementary Table 12) and 167 genes were downregulated (Supplementary Tables 13, 14, Supplementary Fig. 9). Five miRNAs specific to the TNBC cell lines were further studied to identify their binding to downregulated 3′-UTR gene targets (Supplementary Tables 8, 15).
330, miR-3613, and miR-6733 were found to be complementary in studies reported in TCGA data samples using UALCAN webserver Genes STC2, CPA4, and NUPR1 were found to have a relatively higher expression amongst the 13 target genes of TNBC cell lines.

Discussion
The interplay between epigenetic gene regulation through histone modifications and other regulatory mechanisms like ncRNA is of great interest in cancer biology. In the present analysis, the role of H3K4me3 in miRNA expression based on promoter level peaks has been studied using ChIP-Seq and RNA-seq data integration. To achieve the same, a novel approach of mapping data derived from ChIP-Seq (miRNA promoter peaks) and RNA-Seq (targets of 3′-UTRs of genes binding to miRNA) was used to understand epigenetic regulation that may aid in the identification of subtype and cell line specific miRNAs [15,16].
In normal-like cell line MCF10A, of the nine unique miRNAs identified, miR4530 was found to have a role in the suppression of cell proliferation, promote angiogenesis and induce apoptosis by targeting gene VASH1 (Vasohibin 1) in breast carcinoma [67]. Hence, promoter-level epigenetic regulation of miR4530 by H3K4me3 may have a protective role in normal-like subtypes. miR34B was observed to be present only in cell line MB436 (TNBC subtype). miR34B has high expression in TNBC tumors compared to normal types. Expression of miR34B highly correlates with clinical outcome of patients. Notch2 (notch receptor 2) gene that has a role controlling cell differentiation, is a direct target for miR34B [68]. miR6875 was observed in TNBC cell line MB436. According to previous reports, a high expression of miR6875 was observed in early breast cancer patients [69]. miR574-5p attenuates proliferation, migration, and epithelial mesenchymal transition (EMT) in TNBC cells by targeting genes BCL11A (BAF chromatin remodeling complex subunit) and SOX2 (SRY-Box transcription factor 2) to inhibit the SKIL (SKI like proto-oncogene)/TAZ (Tafazzin)/CTGF (connective tissue growth factor) axis [70].
Of the five TNBC subtype-specific miRNAs, mir153, miR6720, and miR-LET7I were found to be upregulated in larger breast cancer datasets belonging to TCGA. miR153 has been reported to have a tumor suppressor role and has been suggested as a prognostic marker for TNBC [34].
The majority of the predicted gene targets (total 44) overlap with previous experimental studies and include 32 gene targets (Figs. 4, 5) of eight miRNAs (miR4512, miR6791, miR330, miR3180-3, miR6080, miR5787, miR6733, and miR3613) which are identified in more than three breast cancer cell lines and absent in normal-like cell lines. Overexpression of miR330-3p in breast cancer cell lines has been reported earlier, which results in greater invasiveness in-vi-tro, and miR330-3p-overexpressing cells also metastasize more aggressively ex-ovo [71]. Gene CCBE1 (collagen and calcium binding EGF domains 1) is a direct target of miR330-3p, and knockout of CCBE1 results in a greater invasive capacity [71]. Exosomal expression of miR3613-3p promotes breast cancer cell proliferation and metastasis. It has been previously reported that miR3613-3p levels were negatively correlated to SOCS2 (suppressor of cytokine signaling 2) expression in breast cancer tissues [72]. Few genes were observed to be targeted by multiple miRNAs (like A4GALT and FOXL2 targeted by three miRNAs each) as it is known that miR-NAs can regulate multiple targets based on seed match and sequence similarity between miRNA-mRNA [10].
Of the remaining 12 gene targets, relative gene expression of genes A4GALT, C2ORF74, HRCT1, ZC4H2, ZNF512, ZNF655, and ZNF608 agree with the proposed hypothesis of H3K4me3 regulated miRNA-mRNA axis in large patient data (TCGA samples) along with their relative expression in other breast cancer cell lines (CCLE database). These genes were associated with poor survival based on KM plots (Human Protein Atlas). The proposed methodology of miRNA-mRNA regulation when analyzed in the context of other histone modifications like H3K27me3, H3K4me1, H3K9me3 will enable better insights into the underlying mechanism of breast cancer regulation. work is funded by the National Supercomputing Mission (NSM) of the Government of India. The authors thank anonymous reviewers for their valuable suggestions and constructive criticism in improving the manuscript.