Genome-Wide Identification and Classification of MicroRNAs Derived from Repetitive Elements
Article information
Abstract
MicroRNAs (miRNAs) are known for their role in mRNA silencing via interference pathways. Repetitive elements (REs) share several characteristics with endogenous precursor miRNAs. In this study, 406 previously identified and 1,494 novel RE-derived miRNAs were sorted from the GENCODE v.19 database using the RepeatMasker program. They were divided into six major types, based on their genomic structure. More novel RE-derived miRNAs were confirmed than identified as RE-derived miRNAs. In conclusion, many miRNAs have not yet been identified, most of which are derived from REs.
Introduction
MicroRNAs (miRNAs) are small non-coding RNAs of ~22 nucleotides (nt) in length and are single-stranded in their mature form. Primary miRNAs are expressed from genomic regions and processed to generate precursor miRNAs by Drosha [1]. Precursor miRNAs have a hairpin structure; therefore, their sources (or, the genomic loci from which they originate) have a palindromic structure [2]. The precursor miRNAs are exported from the nucleus to the cytoplasm and processed into a duplex by Dicer, one of which is preferentially loaded into Argonaute (AGO) [1, 3]. Mature miRNAs function via the RNA-induced silencing complex and AGO protein-mediated binding to the target mRNA by complementary base pairing to the 3' untranslated region [4].
Repetitive elements (REs) are interspersed throughout the genome, and they increase genomic instability through various mechanisms. REs consist of transposable elements (TEs) and tandem repeats (e.g., satellite DNA, simple repeat DNA). REs can directly impact coding sequences or other functional sequences in the host genome as follows. They can affect transcription by acting as alternative promoters [5, 6, 7], forming structural isoforms through alternative exons, and by providing polyadenylation signal sites important in transcriptional termination [8]. REs are also important for inhibiting gene expression at the post-transcriptional level by producing miRNA sequences [2, 9]. REs comprise paralogous miRNA gene families and speciesspecific miRNA gene families [10]. Some miRNAs originate from unique genomic sequences, and others originate from REs. Recently, the association of REs with miRNAs was established in several studies that demonstrated connections between miRNAs and TEs [9, 11, 12]. These studies suggest that REs are important for miRNA origin, expression, and regulatory network formation [2, 9, 12, 13, 14, 15, 16]. Especially, some REs have a palindrome structure, and these sequences have great potential to make a precursor miRNA form. In previous studies, miniature inverted TE (MITE)-derived miRNAs were identified in the human genome [2]. As one of the REs, medium reiterated sequences (MERs) have a palindrome structure in the mammalian genome [17]. MERs were also predicted to make miRNAs [10], and MER-derived miRNAs were confirmed in experiments in human cell lines [11]. REs are ubiquitous and scattered throughout the host genome in abundant numbers; therefore, these RE families have the possibility of making paralogous miRNAs. A MITE-derived miRNA, miR-548, has many homologous gene families [18], and a MER-derived miRNA, miR-1302, also has many homologous miRNAs in the human genome [10]. Likewise, the long interspersed elements (LINE) element also makes an miRNA precursor form by "tail to tail" method [12]. In the case of hsa-miR-28, two LINE elements are oppositely oriented and then make one miRNA precursor form [15]. Based on these results, we separated miRNAs in the case of two REs making one miRNA and of a palindrome structure RE making one miRNA.
REs prefer rapid evolution compared to other genomic sequences; RE-derived miRNAs have a tendency to make phylogeny-specific miRNAs [9]. In this respect, primate-specific Alu-derived miRNAs are primate-specific, and MITE-derived miR-548 was mainly discovered in primates [18]. Genomic duplication events, such as segmental duplications or tandem duplications, also create REs and RE-derived miRNAs in animals [19, 20]. Therefore, many RE-derived miRNAs were identified in the human, rhesus, and mouse genomes [20]. In the plant genome, TE insertions can make both siRNAs and miRNAs, and MITEs have an important role in the creation and evolution of novel miRNAs [13]. In this respect, to analyze TE-overlapping patterns and abundant overlapping TEs with miRNAs in the human genome can provide evolutionary clues in further studies.
In the miRBase database, 55 experimentally validated human miRNA genes derived from TEs are described, and 85 novel miRNAs are predicted from the potential conserved secondary structures of 587 human TEs [9]. However, these studies concentrated exclusively on the identification of miRNAs containing REs and did not analyze the patterns of overlap between REs and miRNAs. Moreover, newly identified miRNAs and small transcripts with the potential to form miRNAs have not been considered. Therefore, we analyzed TEs that overlapped with both previously identified and novel miRNAs and examined six patterns of overlap that occur. Our results suggest that REs contribute to the production of human miRNA genes by a number of mechanisms.
Methods
Computational analysis
We used miRNAs annotated as small non-coding RNA genes defined by the GENCODE database, v.19 (http://www.gencodegenes.org) [21]. RepeatMasker outputs (hg19 assembly, RM v.330, repbase libraries 20120124) were obtained from the University of California, Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/, hg19). To analyze the intersection between miRNA and REs, we used the intersectBed command (with options -wa and -wb) in BEDTools [22].
Classification of miRNAs into six types
The RE-matched precursor miRNA sequences obtained were divided into six major types. The six types were as follows: miRNAs that overlapped with two or more REs (type 1), miRNAs related to the TcMar-Mariner family (type 2) and the MER family (type 3), and miRNAs sorted according to the matching scheme between REs and miRNAs (type 4 to type 6). The classification scheme is described in a flow chart (Supplementary Fig. 1). miRNA sorting and deletion of duplications were performed using Microsoft Excel (Supplementary Table 1).
Results
MiRNAs originating from two or more REs in the human genome
In total, 1,900 miRNAs were confirmed as RE-originated, including 406 previously identified miRNAs and 1,494 novel miRNAs (Table 1). We identified 452 type 1 miRNAs, which have two or more RE-derived precursor miRNAs (23.79%) (Fig. 1A). Only 72 previously identified miRNAs (or 15.93%) and 380 novel miRNAs were classified as type 1 (Fig. 2). Most identified miRNAs overlapped with two REs, and three (miR-325, miR-649, and miR-5692b) overlapped with three REs. Interestingly, miR-649 consisted of three different RE families, including a LINE, short interspersed element (SINE), and DNA transposon. Those regions with three or more RE-derived miRNAs were more likely to be novel miRNAs. Three novel miRNAs (AC079412.1, AL158077.1, and AL356865.1) overlapped with five REs.
Palindromic structure of RE-derived miRNAs
Precursor miRNAs form palindromic structures. Therefore, TE families with a palindromic sequence structure, including MITEs and MERs, have the potential to form mature miRNAs [2, 10, 11]. Both of these RE families may be able to form miRNA sequences themselves. Therefore, we assigned MITE-derived miRNAs and MER-derived miRNAs into the type 2 and type 3 categories, respectively.
Type 2 precursor miRNAs were distinguished by the presence of MITEs, specifically MADE1, which consists of two 37-bp terminal inverted repeats flanked by 6 bps of the internal sequences; 390 regions were included in this category (20.53%) (Fig. 1B). For most type 2 miRNAs, the MADE1 sequences constituted more than 90% of the total miRNA sequences. These miRNA sequences may or may not have RE-derived sequences on both terminal sides, and miRNA precursors containing RE-derived sequences were classified as type 1. The palindromic sequence structure of MADE1 has the potential to form precursor miRNAs, and several studies identified mature MADE1-derived miRNAs. In previous studies, MADE1-derived miRNAs were identified as a part of the miR-548 gene family [2]. Seed shifting events in the miR-548 gene family were detected by evolutionary analysis [18]. According to our criteria, most genes in the miR-548 family were type 2 miRNAs, including seven miRNAs that were previously identified [2]. However, miR-548a-2 and miR-548a-3 were classified as type 1, because MADE1 sequences were inserted into RE sequences, and together, these RE sequences produce miRNA sequences.
Interestingly, MADE1-derived miRNAs are inserted into the specific (hot-spot) sequence TA-TAT or repetitive sequences, such as LINEs, long terminal repeat (LTR) elements, and other DNA elements. Some MADE1-derived miRNAs harbored hot-spot sequences (TA-TAT) in their miRNA gene sequences (Fig. 3). Therefore, MADE1-derived miRNAs likely formed the miR-548 family, known to be primate-specific [2, 18].
Type 3 precursor miRNAs were identified by the presence of MER sequences (14.05%, n = 267) (Fig. 1C). Most MER-derived miRNA precursor sequences overlapped with MER sequences, because MER palindrome sequences are similar to miRNA precursor sequences and may be able to form miRNA sequences themselves [11]. Notably, miR-1302-5 was classified as a type 1 precursor miRNA, because it combined two RE families (MER53 and AluSx).
Patterns of RE-overlap with miRNAs
Type 4 precursor miRNAs harbor one RE sequence (2.53%, n = 48) (Fig. 1D). Precursor miRNAs are approximately 60-80 nt [23]; so, it is unlikely for REs to occur in precursor miRNA sequences. Most type 4 miRNAs contained Low_complexity and Simple_repeat, because they tend to be relatively short. In the Simple_repeat family, a short repeat sequence helps to form miRNA precursor sequences by binding regions of complementary short repeat sequences. For example, miR-574 contains repeat (TG)n in its precursor regions (Supplementary Fig. 2A), and (TG)n sequences contain miR-574-5p sequences (Supplementary Fig. 2B).
Type 5 precursor miRNAs were those formed from flanking sequences and one RE (9.63%, n = 183) (Fig. 1E). This category had the highest ratio of identified miRNAs to total miRNAs. The novel miRNA nomenclature process requires cloning or expression evidence. Then, this information is described in a manuscript accepted for publication [24, 25]. The identified miRNAs have a tendency to be abundantly and ubiquitously expressed in the host. Intriguingly, two RE families, SINE/mammalian-wide interspersed repeat (MIR) and LINE/L2, were commonly detected in type 5 miRNA precursor sequences. These two families were abundant in conserved segments and were commonly detected in murine intergenic regions of human orthologs [26]. These results indicate that the L2 and MIR TE families were highly conserved and that these RE-derived miRNAs have important functional roles in the host. Taken together, type 5 miRNAs are relatively abundant and evolutionarily conserved.
By contrast, type 6 precursor miRNAs, or those formed from a single RE, represented 29.47% of the sample (n = 560) (Fig. 1F). The REs contained in type 6 precursor miRNAs have the potential to produce miRNA sequences themselves. In type 6 miRNAs, SINE/Alu elements are the most common.
Discussion
In this study, we classified miRNAs based on overlap patterns in identified miRNAs and novel miRNAs. In some cases, two or more REs were approached by "tail to tail" method and then making one miRNA precursor form. We classified these cases as type 1, and LINE/L1-derived miRNAs were abundantly discovered (Table 2). A previous study showed that miR-558 is derived only from MLT1C in the LTR family [9], but we found two repeat families (LTRs and simple repeats) in the locus. We also determined that miR-619 and miR-1302-5 were derived from the combination of two adjacent repeat families. Recently, the updated human genome assembly was open to the public (GRCh38/hg38), and novel REs have also been identified. Our classification of RE-overlapping miRNAs can provide the criterion to explain the origins, evolution, and family expansion of miRNAs in another human genome assembly or in other species.
In type 1 precursor miRNAs, some cases overlapped with the same families of TEs, which can make it possible to form an miRNA precursor form in a "tail to tail" scheme. In a previous study, two LINE elements were predicted to form in an oppositely oriented method, like hsa-miR-28 [12], and LINE was also the most abundant TE family in the type 1 miRNAs in our study. This result can help identify new miRNAs derived from two or more REs. MADE1s are broadly distributed among eukaryotes and function as regulatory RNAs in many genomes [27]. They are expressed with the gene sequences in which they are inserted. This phenomenon provides active opportunities for MADE1 hairpins to function through an RNA interference enzymatic mechanism involved in functional gene regulation [28]. We identified miRNAs that consisted of more than 90% MADE1 sequences and determined the specific mechanisms underlying the formation of MADE1-derived miRNA palindrome sequences.
Previous reports found 103 orthologs of the miR-1302 family in placental mammals. Moreover, the family has undergone multiple duplication events, and some of the duplicated genes have diverged functionally (e.g., RNA-based TE defense mechanisms), whereas others have become pseudogenes or have been eliminated from the genome [10]. Therefore, it has been suggested that the miRNA gene family evolved according to a birth-and-death model [10, 29, 30, 31].
Alu elements and miRNAs are related. Alu elements and those resulting from duplication events are induced to make new miRNAs and, specifically, an miRNA cluster on chromosome 19 (C19MC) [32]. C19MC presents primate-specific imprinted patterns in the placenta and may be an example of co-evolution between Alu elements and miRNAs [33, 34]. Alu elements are abundant in human chromosome 19; hence, miRNAs are also abundantly detected, including C19MC [32]. Most SINE elements, such as Alu, tRNAs, and 5s-rRNAs, are expressed by polymerase III (pol III), and several miRNAs are expressed by pol III. Some miRNAs are expressed together with Alu elements by pol III [35]. These data also demonstrate the strong relationship between miRNA and Alu elements.
In an evolutionary aspect, primate-specific miRNAs were discovered to contain an Alu element in their precursor form. Other small REs, such as tandem duplications, occupy the middle region of miRNA precursor forms. Therefore, type 4 miRNAs have abundant Simple_repeat families (Table 2). These data demonstrate that miRNAs have been made by genomic evolutionary events.
In conclusion, we determined that 1,900 RE-derived miRNAs can be divided into six major types. Of them, 406 identified miRNAs and 1,494 novel miRNAs were confirmed using the GENCODE database, and their RE patterns were sorted using the RepeatMasker program. The results suggest that RE sequences were interspersed throughout the genome and form miRNA precursor sequences that play important roles in the host genome. These regions may contribute to the evolution of biological complexity.
Acknowledgments
This research was supported by awards from the AGENDA project (Project No. PJ009254) in the National Institute of Animal Science, Rural Development Administration (RDA).
Notes
This is 2014 KOGO best paper awarded.
References
Supplementary materials
Supplementary data including one table and two figures can be found with this article online at http://www.genominfo.org/src/sm/gni-12-261-s001.pdf.