Introduction
According to the molecular evolutionary theory, mutations occurring in coding regions involved in amino acid changes, called nonsynonymous mutations, are basically harmful and deleterious to organisms and are subject to strong purifying selection [1]. Changes in amino acids in protein sequences by any type of mutation—particularly, changes from original or ancestral amino acids into new amino acids with very different physicochemical properties—can cause severe problems in protein structure and function [2]. In contrast, mutations in coding regions but not related to amino acid changes are called synonymous mutations and are generally considered to be functionally neutral [3–5]. In fact, most recent research on identifying disease-causing variants in genetic diseases, including Mendelian diseases and common complex diseases, such as diabetes and cancers, has been focusing on searching for nonsynonymous variants [6–12]. In most of these studies, synonymous variants have been ignored and even filtered out for further functional validation processes. Another difficulty in studying synonymous variants is that there are no good or established tests for functional synonymous codons.
One clue suggesting the non-neutrality of synonymous codons could be drawn from the study on synonymous codon usage bias (SCUB) [13–15]. SCUB means that the uses of synonymous codons involved in encoding the same amino acids are not equivalent to each other, depending on the kinds of proteins or the species carrying those proteins. In other words, different proteins in different species have different synonymous codon preferences when determining amino acids [15, 16]. No such biased codon preference among different synonymous codons is expected under the assumption of the neutrality of synonymous codons. Researchers have been recognizing this intriguing phenomenon for a long time and trying to figure out what factors lead to SCUB [17].
Two different explanations have been provided so far about what causes the SCUB phenomenon. One is based on uneven nucleotide compositions throughout genomes [18]. Simply, the research groups supporting the idea of biased nucleotide compositions think that genes located in genomic regions with abundant G or C content tend to prefer codons with G- or C-endings [19]. Consistently, it has been observed that there is a correspondence of G/C content between introns and exons in mammals, which means that genes harboring introns with higher G/C content are likely to carry exons with higher G/C content. The second scenario posits that SCUB is basically a consequence of natural selection of the roles of synonymous codons in the regulation of the expression of genes [20–22]. The research groups supporting the second scenario insist that genes with higher levels of expression are expected to have higher evolutionary constraints on the requirement of codons necessary for improving translational efficiency and accuracy than genes with lower levels of expression [23, 24]. In fact, numerous studies have shown that the codon usage of highly expressed genes is biased toward optimal codons than lowly expressed genes and that this biased usage is linked to the enhancement of translational speed and accuracy [25, 26]. In relation, the amounts of cognate tRNAs carrying anticodons are higher against optimal codons than against non-optimal codons.
An evolutionary explanation of SCUB, based on natural selection, was investigated under a hypothesis, called Hill-Robertson (HR) interference. The HR hypothesis posits that the efficiency of natural selection of one site will weaken when the site is linked to adjacent sites and does not segregate independently, wherein recombination can play a role in relieving the interference. There is some agreement among researchers that SCUB is positively correlated with recombination rates, although there are opposite observations on the effect of HR [27–29].
In the present work, we thus decided to summarize other facets of the functions that synonymous codons might have, other than the functions that are related to translational efficiency and accuracy in the regulation of gene expression (Fig. 1).
Results
Function of nonoptimal codons
SCUB is considered to be a general phenomenon that can be observed in the genomes of almost all species [13–15]. However, patterns of SCUB are not equivalent to each other among different genomes in different species. Some genomes have a higher preference toward optimal codons, while other genomes do not show this preference [13]. The amounts of optimal codons vary, depending on the kinds of genes [13]. SCUB patterns in genomes are more similar in closely related species than in distantly related species [30]. Under the natural selection scenario, genes with low levels of bias in codon usage are generally thought to be the result of a lack of natural selection toward optimal codons.
However, some researchers have tried to provide an opposing explanation regarding why some genes avoid optimal codons within genes. They think that genes tend to harbor codons that are rarely used in translation, because rare codons are beneficial in checking the step right before ribosomes start translation or the protein folding step before secreting nascent protein product in the endoplasmic reticulum [31, 32]. Additionally, recently, Zhou et al. (2013) [33] provided another interesting observation in Neurospora, suggesting that the usage of non-optimal codons in the Frq gene is essential in regulating circadian rhythms.
Splicing regulation
Intron-exon boundary regions, also known as limited by the GU-AG rule, are important in carrying out splicing events [34]. Additionally, potential splicing enhancers (ESEs) or splicing silencers residing near intron-exon boundaries—i.e., DNA sequence motifs within an exon—are known to enhance or suppress splicing. Thus, some synonymous codons can participate as constituents of these motifs. Therefore, a synonymous mutation of a gene can affect the splicing of that gene, without any influence on amino acid changes in that gene. In fact, Takahashi (2009) [35] has shown in Drosophila that translationally optimal codons tend to be avoided within the ESE motifs, and compared codon usage biases of exons with those of ESE regions in the Down syndrome cell adhesion molecule (Dscam) gene using codon bias indices, called CBI [35, 36]. Dscam is known as one of the genes with the largest number of alternatively spliced exons [35]. Furthermore, another study showed that almost none of the synonymous codons residing in ESEs was an optimal codon in the regulation of translational efficiency [37]. The same study showed that this conflict on the roles of synonymous codons between translational regulation and splicing control was larger in highly expressed genes [37].
Regulation in transcription
It is quite unexpected that synonymous codons are somehow linked to the regulation of transcription. Transcription is a process of copying genic sequences into mRNAs, which requires various specific and delicate controls, usually conducted by the combinatorial actions of cis-acting DNA elements and transacting regulatory factors, called transcription factors (TFs). Promoters are well-established controlling cis-acting elements and are generally located upstream of transcription start sites, which are thought to be separate regions from the coding regions of genes.
Recently, Stergachis et al. (2013) [38] reported a surprising observation that approximately 15% of coding regions play roles both as coding sequences for determining amino acids and, astonishingly, as TF binding sites, by DNase footprint analysis in more than 80 different human cell types. Additionally, according to the paper, synonymous codon sites with dual functions are evolutionarily conserved, the binding sites of TFs recognizing stop codons are selectively depleted, and single-nucleotide variants residing within synonymous codons with dual functions can alter TF binding [38]. Therefore, it is plausible that mutations in synonymous sites can lead to obstruction of normal transcription, despite no amino acid alterations being driven.
Regulation of RNA secondary structure
The secondary structure of mRNA is important for controlling translational speed and timing, on which synonymous codon mutations might have crucial effects. For instance, removing rare synonymous codons from an expression construct decreases the enzyme activity of chloramphenicol acetyltransferase [15], which is due to alteration of the mRNA secondary structure by the removal of rare codons. The secondary structure of mRNA is also known to be associated with ribosome pausing, which is important for the formation of correct protein folding for insertion into the lipid bilayer in yeast [39].
The same observation has also been made in Escherichia coli, such that the translational rate is influenced by the protein folding efficiency and that the folding efficiency is associated with an abundance of codons with a low concentration of cognate tRNAs. Consistently, Zhou et al. [40] found that optimal codons for high efficiency of expression are located in structurally sensitive sites in proteins. Zhang et al. [41] also demonstrated that synonymous codon mutations significantly perturb folding efficiency. Saunders and Deane [42] observed that synonymous codon usage is related to the secondary structure of local mRNA. All of these studies consistently suggest that synonymous codons might be involved in the formation of mRNA secondary structures that, ultimately, control translational speed and timing.
Conclusion
Researchers have long been searching for important disease-associated variants, mainly by investigating functional perturbations caused by nonsynonymous mutations. On the other hand, synonymous mutations have not been considered to be variants that are responsible for showing disease phenotypes. Often, synonymous mutations are filtered out and are not even considered, because candidate variants need to be functionally validated for the possibility of causing diseases or phenotypes. In that sense, it is an interesting revelation that there are various functional activities conducted by synonymous codons in molecular processes. Synonymous mutations do not change the amino acid sequence in proteins but can interrupt the formation of correct mRNA secondary structures, reduce translational accuracy and speed, and even alter the start of transcription. We think that advancing our understanding of the functions of synonymous codons will contribute to the identification of all disease-associated genes and mutations.