Introduction
Modern humans are indeed a very young species compared to their cousins, evolving just about 200,000 years ago (ya), which is a fraction of the 6 million years since the divergence of the human and chimpanzee lineages [1]. Fossils suggest that modern humans first emerged in East Africa and spread fairly quickly all over the world in the next 185,000 years or so (reviewed in Lu et al. [2]). After the divergence of humans and chimps, the major landmark in human history is the emergence of bipedals about 4 million years ago (mya), which enabled them to use their two feet as hands. Many species evolved afterwards until the evolution of Homo erectus, who, for the first time, migrated out of Southern Africa and initiated the spread of humans all around the globe. The migrated population of Homo erectus in East Africa eventually gave rise to modern humans about 200,000 ya and to Homo neanderthalensis, or Neanderthals, about 400,000 ya [3, 4]. Neanderthals survived until 28,000 ya, while modern humans are still surviving [5]. During the latter part of their existence, Neanderthals lived in Europe, as well as in Western Asia and the Middle East [6, 7]. Various lines of evidence suggest that modern humans started to migrate from East Africa to Europe and other parts of the world 100,000 ya, and the fossil evidence of humans and Neanderthals indicated that these species might have come into contact as early as 80,000 ya and co-habited for up to 10,000 years at certain geographic locations [6].
In the field of evolutionary biology, one of the most sought after questions has been what made modern humans superior than other related species-i.e., the genomic features that are unique to humans. The whole-genome sequencing of chimps, rhesus macaque, and other primates has given considerable boosts in this field, as the sequences of these primates opened up the possibility to conduct comprehensive comparative studies to the single-nucleotide level [8, 9]. Many attempts have been taken to identify the genetic reasons why modern humans developed such complex biological features than other primates, including the larger brain-to-body ratio, bipedalism, morphological changes, and significant development of communication skills and cognitive behavior. Recent studies have used various statistical methods to compare the sequence of these primates with humans in an attempt to find human-specific genes and gene regulatory sequences, eventually showing unexpectedly rapid evolution in the human lineage after the divergence from the ancestral primates [10-15]. The results from these analyses exhibit a good overview of the human-specific genomic elements, but these results are unable to distinguish which of these human-specific elements are specific to modern humans only. Since there has been no complete genome sequence of any archaic humans until recently, such sequence comparisons have been made only between the modern human genome and other primates, bypassing archaic humans, resulting in an overwhelming number of differences and the inability to identify which sequences changes are unique to modern humans and which are shared by all Homo species. Therefore, the comparative analysis between modern humans and archaic humans is expected to be more interesting and valuable by being more effective in identifying the critical genes and/or regulatory elements that may be fully or partially responsible for the evolution of the modern humans over other humans.
In order to identify such sequence changes, the modern human genome sequence must be compared with that of archaic humans. Neanderthals have always been a desired target for this purpose for multiple reasons. They are the closest cousins of modern humans, both anatomically and based on intelligence. Their slightly larger brain and wider body structure are the primary anatomical differences from the modern humans [16]. Fossil evidence suggests that Neanderthals used stone tools, they were hunter gatherers, and they had a social life, indicating that they had similar intelligence as modern humans until about 50,000 ya, when a "Creative explosion" occurred in modern humans [17, 18]. Another critical reason to target Neanderthals is that they are the latest archaic humans to go extinct, and their remains have been found in sufficiently good condition for analyzing down to the molecular level [19-21]. To make a comprehensive genomic comparison between modern and archaic humans, the whole genome of archaic humans has to be sequenced. The availability and widespread use of massively parallel high-throughput sequencing have now made it possible to sequence archaic genomes, which seemed impossible even a decade ago. The idea of sequencing an ancient genome was first implemented on cave bear [22] and, after its success, the mammoth genome [23]. The initial sequencing attempts on Neanderthals included sequencing of the mitochondrial genome, which was successful to a certain extent [24-26]. These successes eventually led paleogeneticists to attempt sequencing Neanderthal nuclear genomes. Almost all of the attempts have been made by amplifying the genome fragments via PCR and parallel sequencing, while some involve the use of a metagenomic approach. The early success of such attempts eventually led to the establishment of the Neanderthal Genome Project in 2005, which first announced the complete genome sequence of Neanderthals in 2010 [27] and recently released a cleaner and higher-coverage version (http://www.eva.mpg.de/neandertal/). Comprehensive comparisons have been made by the group on certain genetic entities with some very interesting inferences, based on the initial low-coverage sequencing data, but improvements in many aspects can be made by utilizing the recent, better version of the sequence and considering more types of genetic variances in the study. Here, we will discuss the inferences made from the comparison and what can be done next to answer some interesting questions regarding the evolution of modern humans.
Why Whole Genome Sequence of Neanderthals?
Many lines of archaeological evidence indicate that humans and Neanderthals may have coexisted in certain geographic locations. This gives rise to the most debatable question regarding the recent history of humans: did modern humans and Neanderthals interbreed? If they did, was it to an extent where meaningful exchange of genetic information may have occurred? Do we still carry any genetic elements of Neanderthals? Comparing the genome sequences is probably the best way to answer all these questions. All other features, including the level of intelligence in Neanderthals, have been speculated from bones, settlements, or artifacts found, and there is no way to be certain about the practicality or validity of the inferences made from these remains. The hypothesis that Neanderthals were able to practice complex behavior has already been disputed [28-30]. There has also been a significant amount of debate about the admixture of humans and Neanderthals. Morphological analyses have provided strong arguments both for and against the genetic exchange between these two species [31, 32], as have the comparative analyses of DNA sequences of these species [33-35]. The genome sequence itself can not validate many of these inferences, either, but it can answer the question of admixture and articulate the genetic elements that are unique to Neanderthals or to humans or those that are shared by both. Studies on the expression or protein functions in association with these unique elements, despite being plausible only for those found in the human genome, can eventually facilitate analysis of complex biological phenomena, such as reasoning, language, or other qualitative or behavioral traits, at the molecular level.
In the past, a candidate gene approach has been successfully implemented to identify the presence or variance of certain genes that were believed to be modern human-specific. Using this approach, a number of speculations about Neanderthals could be made, including their skin and hair color. It was also discovered that Neanderthals had the same FOXP2 gene as modern humans, which was previously linked to language ability [36]. This approach of fishing for particular genes proved efficient, but there is a lot more than known genes in the whole genome that may play a critical regulatory role in gene expression and thus development. For instance, almost half of the human genome consists of transposable elements; these elements can affect gene expression by activating or deactivation functional genetic elements and by altering the protein coding by creating alternative splicing or creating new chimeric genes ([37-39], reviewed in [40, 41]). Transposable elements are also polymorphic among different populations of modern humans, and their association with phenotypic traits, including diseases, has been extensively studied, albeit with a lot more to be learned [42-44]. The whole-genome sequence is necessary for identifying transposable element insertions that may have taken place only in modern humans and subsequently assessing their functional impact.
Comparative analysis
The only part of the Neanderthal genome that was sequenced completely from multiple specimens until recently was the mitochondrial DNA (mtDNA). The major analysis that was made with the mtDNA sequence was to seek an answer to the question of interbreeding. The Neanderthal mtDNA sequence consistently falls outside the spectrum of variations observed in the modern human mtDNA sequence, indicating no interbreeding between the species [24-26, 45, 46]. Previously, besides detecting specific mutations in MCPH1 and FOXP2, the candidate gene approach also detected the presence of fragments of the MC1R gene that may indicate the red hair and pale skin of Neanderthals [47], segments of the ABO blood group locus [48], and a taste receptor gene [49]. Despite many critical technical challenges in sequencing ancient genomes ([26]; reviewed in [50]), the whole genome of Neanderthals has been sequenced recently by Green et al. [27], and it allowed this research group to perform a genome-wide sequence comparison with the modern human genome sequence, and the significant results from the initial analyses are discussed briefly below.
Substitutions and indels
Green et al. [27] inferred that 10,535,445 substitutions and 479,863 indels have occurred in the modern human genome after the divergence from chimps. The vast majority of these occurred before Neanderthals and modern humans diverged. However, 78 non-synonymous nucleotide substitutions that are fixed for a derived state in modern humans are different in the Neanderthal counterpart, as Neanderthals carry the ancestral state of these polymorphic nucleotides. Only five genes were identified that have more than one fixed substitution in their coding regions, and one of them has an altered start or stop codon. It is particularly interesting that three of these genes are expressed in skin, including RPTN, which encodes the protein repetin, and TRPMI, which encodes melastatin. This might be indicative of a change in selection of skin physiology in the modern human lineage. When looking into differences in regulatory elements, a total of 132 substitutions and 36 indels were identified in the untranslated regions. One microRNA of unknown function, hsa-mir-1304, which was identified by parallel sequencing of human embryonic stem cells [51], was found to have one fixed substitution and one single nucleotide insertion. Since the substitution occurs in the seed region, it is not unlikely that this microRNA has different targets between these two species.
Selective sweeps in modern humans
There are 212 positively selected regions identified in modern humans that occurred early during the history in conjunction with or shortly after their divergence from Neanderthals. The largest of the positively selected regions contains the gene THADA, single-nucleotide polymorphisms (SNPs) in the vicinity of which have been linked with type II diabetes [52]. This change may have affected the energy metabolism of early modern humans. A number of other genes that lie on these selectively swept regions have been associated with other genetic disorders, such as autism, schizophrenia, and Down syndrome. Since both autism and schizophrenia are related to cognitive development, it could be assumed that multiple genes involved in cognitive development in humans were positively selected early in the history of modern humans.
Admixture
Even though previous studies with Neanderthal mtDNA [26, 33] and initial sequencing of nuclear DNA showed no evidence of interbreeding between Neanderthals and humans [21, 53], the most striking revelation after the comparative analyses between the whole-genome sequence of Neanderthals and multiple modern human individual genomic sequences was the demonstration of admixture between these two species. Green et al. [27] compared the Neanderthal genome with eight modern-day human genomes of European Americans, East Asians, and West African ancestry. Surprisingly, the Neanderthal genome appeared more similar to all non-African genomes than to African ones. They share significantly more derived alleles (alleles that are different from in chimp) with non-African populations than with the African, and when compared with European and Asian individual genomes, Neanderthals are found equally close to both populations. This and some other analyses made by the group only indicate an exchange of DNA between Neanderthals and the non-African population. With further comparative analysis, the same group also identified that gene flow occurred unidirectionally from Neanderthals to the modern non-African human population.
A couple of other findings from different experiments, along with a genome-wide comparison, provide strong evidence for exchange of genetic information. The first finding involves the MAPT locus in chromosome 17, which has two distinct haplotypes, H1 and H2. H1 is abundant in almost all populations of modern humans, while H2 is found only among Europeans and found to have entered into the Homo lineage approximately only 10,000 to 30,000 ya. However, a comparison between H1 and H2 in chimp suggests that the common founder of H1 and H2 is far older than 30,000 years. Even though the Neanderthal genome sequence has been found to carry the H1 haplotype, coinhabitation of the H2 chromosome carriers during the time period when modern humans coexisted with archaic humans can not be ruled out because of the scarcity of archaic genome data. Thus, one can still argue that the H2 haplotype found in modern humans could possibly be a result of horizontal gene transfer between modern humans and Neanderthals and remained in modern humans under selective pressure, possibly because the H1 haplotype has a role in neurodegenerative diseases [54]. In a similar scenario, haplotype D of the microcephalin gene is found to have originated 1.1 mya in a lineage other than modern humans but integrated into the modern human genome only about 37,000 ya. It has thus been speculated that this haplotype was horizontally transferred into modern humans from archaic humans, most likely Neanderthals [34]. However, in a more recent study, the microcephalin locus from a Neanderthal individual in Italy was sequenced and found to be homozygous for the ancestral non-D haplotype [55]. The whole-genome study by Green et al. [27] does not support these speculations, either, since the Neanderthals they analyzed do not carry these D-haplotypes.
One striking revelation from the whole-genome comparison by Green et al. [27] is the equal level of similarity of the Neanderthal genome with Papuan and Chinese and French, although fossil records show the existence of Neanderthals only in Europe and western Asia. The group explained this anomaly by arguing that the interbreeding between the species occurred earlier than previously expected, before the divergence of Europeans, East Asians, and Papuans. Archaeological evidence suggests that modern humans appeared in the Middle East before 100,000 ya, where Neanderthals were already present, and probably remained until 50,000 ya [56]; this makes the prediction by Green et al. [27] probable.
Future Directions
Increasing coverage, sequencing more Neanderthals, and the Y chromosome
For a more comprehensive comparison of whole-genome sequences between Neanderthals and modern humans, the sequence coverage of Neanderthals has to be increased. The three approaches made so far to sequence the Neanderthal genome have resulted in the sequencing of only 65,000 bases [21], 1 million bases [19], and finally, the draft genome sequenced recently [27], consisting of only two-thirds of the whole genome with a mere 1.3× coverage. With such low coverage, it is hard to form meaningful contigs, and a number of important genetic entities will remain unnoticed. Even though it was beyond imagination to sequence a Pleistocene specimen a decade ago, the progress that has been made in the last 5 years is good enough to expect that more such specimens be sequenced in coming years. The more specimens from various geographic locations that are sequenced, the more likely it will be to construct a reference genome sequence for Neanderthals. As the human genome sequence varies considerably among different populations, it is expected that Neanderthals also have variation in their genomic sequences among different populations from different locations. Such variations can only be identified by sequencing a wide range of specimens, and these variations may again change insights into the Neanderthal-modern human relationship.
The sequencing of Neanderthals first started with its mitochondrial DNA in 1997 [25]. Comparisons between Neanderthal and modern human mitochondrial DNA have been made extensively, but these comparisons only reveal the maternally inherited difference between the species, as mitochondria DNA is transmitted maternally. The complete genomic sequence of Neanderthals published recently is also from a female specimen. Thus, the Y-chromosome of Neanderthals or paternal inheritance has yet to be examined. Comparisons of the Y chromosome sequence of Neanderthals with currently established Y-haplogroups for modern humans should provide some insights into the admixture hypothesis. With respect to the recent finding of admixture of Neanderthals with non-African populations, the Neanderthal Y chromosome should not match the Y haplogroups A or B, as these haplogroups are the oldest of the clades and almost restricted to Africans and their descendants [57, 58]. Since haplogroup E is found in Africa, the Middle East, Southern Europe, and Asia [59-61], the Neanderthal Y chromosome may match this haplogroup, but it should not match the haplogroups E1b1a*, E2b1, or B2a1a, as they are specifically treated as Bantu expansion markers, while Neanderthals interbred only with non-Africans [62, 63].
Retrotransposon insertion polymorphism
Almost half of the human genome comprises retrotransposons. Although they were overlooked for a significant period of time in our genetic study, their importance in chromosome structure, gene regulation, and disease predisposition has now been well established. Retroelements are widely divided into two categories-one with long terminal repeats (LTRs) and another without the LTRs. Short and long interspersed repeat elements (SINEs and LINEs, respectively) are two of the major classes of non-LTR retroelements, while SINEs are the more abundant class. Among SINEs, Alu is the predominant type of retroelement. Among all Alus found in the entire human genome, only about 0.5% is found to be present in the human genome but absent in orthologous regions of other primates and are thus identified as human-specific. This 'young' group of Alus is composed of only about 5,000 Alu elements that are believed to have integrated in the human genome after the divergence of humans and great apes [64-68]. Studying the retrotransposon insertion loci in Neanderthals will identify truly modern human-specific retrotransposon insertion polymorphisms. A similar comparative analysis would reveal other transposable elements, such as L1, SVAs, and HERVs, that are specific to modern humans only, as well as those that are specific to Neanderthals. Retroelements are particularly important in population genetics. It is extremely rare that a newly inserted transposable element is completely excised; thus, they act as a genetic fossil that is homoplasy-free. This identical-by-descent nature of retroelements makes them better markers for population and evolutionary studies than SNPs, in the sense that SNPs can, though rarely, be mutated back to the previous state. SNPs are also very hard to detect while handling ancient genomes due to transformation and deamination [69], while retrotransposon insertion polymorphisms (RIPs) refer to the presence or absence of a retrotransposon. Once a retrotransposon is inserted at a new location in an individual, it is subject to genetic drift. Over a short period, it starts spreading in to the population. Depending on when a retroelement has integrated at a certain locus, it will be shared by different species or, if recently enough, by different populations of the same species. Thus, RIPs occurring before the divergence of chimps and humans are shared by humans and chimps, but those occurring after are present only in humans. RIPs that are even more recent are specific to certain human populations only [70, 71]. For instance, some RIPs are found only in Africans, some in Han Chinese, and so on. The detailed information about all polymorphic retroelements and their frequency in different populations is extensively cataloged in the dbRIP database [43]. The identical-by-descent and homoplasy-free nature of RIPs makes them useful genetic markers in population and evolutionary genetics. The specificity of RIPs can play a significant role in answering the question of admixture of Neanderthals and modern humans. Finding RIPs that are shared between Neanderthals and non-African populations but not present in African populations can be considered solid support for the proposed admixture between Neanderthals and non-African populations. In an ongoing study in our laboratory, over 500 RIPs were identified to be present in Khoisan and Bantu individuals, who represent the oldest lineage of modern humans from Southern Africa, but not in the reference human genome (unpublished). These oldest African lineage-specific RIPs theoretically should also be absent from Neanderthals.
Human-specific unprocessed pseudogenes
Pseudogenization has always been an interesting topic that has not been explored much. Many pseudogenes have been identified lately that lost their functional capacity in human lineage after the divergence of humans and chimps, particularly related to immunological functions [72]. The "less is more" hypothesis states that gene loss may direct evolutionary changes, as these pseudogenized genes have impacts on adaptation of the species through evolution [73]. Even though pseudogenization does not initiate under selective pressure, the gene loss is retained and subsequently allows adaptation. Pseudogenes are found particularly important in humans, and some of them have even been identified as being responsible for certain human-specific phenotypes. For example, the sarcomeric myosin gene was lost at the time of emergence of the genus Homo and is thought to be responsible for marked reduction of hominin masticatory muscles, leading to expansion of brain size [74]. Analyzing pseudogenes in Neanderthals would be beneficial to identify genes that were lost after the divergence of Neanderthals and modern humans and determine what biological impact they may have.
Human accelerated conserved noncoding sequences (HACNSs)
HACNSs are sequences in the human genome that were conserved throughout vertebrate history but changed significantly after the divergence of humans and chimps. Human genomes have a number of such sequences that obtained surprisingly more mutations after the emergence of humans than before. These regions are typically rich in cis-regulatory transcriptional enhancers that render specific expression pattern of genes involved during development [75-77]. Identification of these cis-regulatory elements in human or other large genomes is done mostly by cross-species sequence comparison, primarily because the functional cis-regulatory elements are generally unique to the genome, which makes paralogy-based identification of such sequences nearly impossible [78]. Since it has long been proposed that phenotypic variation between humans and chimps is mostly brought about by regulatory elements than coding sequences [79], it would be particularly interesting to make an elaborated comparison between differences in conserved regions in modern humans and Neanderthals. Such a comparison between the human genome and initial Neanderthal draft genome sequence [27], involving a total of 2,613 human accelerated regions, revealed that the Neanderthal sequence carried 3,259 human-specific changes in these regions. The comparison revealed that 51 positions in 45 regions were different between these two species; Neanderthals carried the ancestral form while all modern humans carry the human-specific variant. In a recent study, it has been found that the Neanderthal genome retains the ancestral state of a polymorphic site in a conserved noncoding microRNA, which is involved in regulating two genes that are important for teeth formation [80]. This may explain the dental differences between modern humans and Neanderthals. These findings are interesting enough to initiate further studies to analyze the probable impact of the variations in conserved noncoding sequences (CNSs) that exist between these two species. Furthermore, previously in three different studies, the number of HACNSs was found to be between 202 and 1175 [10, 11, 15]. All these studies used different methods to identify CNSs and included more species during the comparison. Their data can also be included in the future for the comparison of CNSs between modern humans and Neanderthals. The most rapidly evolving HACNS identified so far, named HACNS1, has accumulated 16 human-specific changes out of its 546 bases since the divergence of humans and chimps [12]. HACNS1 functions in multiple structures early in the developing stages of the mouse embryo as a transcriptional enhancer, including the developing anterior limb [12]. But, this function is missing for the orthologous enhancers in chimpanzee and rhesus, suggesting that HACNS1 has a different function in humans. When the chimpanzee enhancers are humanized by introducing 13 of the 16 human-specific substitutions, gene expression was observed in the limb, indicating that the substitutions that were identified by comparative analysis were directly responsible for the functional modification. In a recent study, 16 human-specific mutations in HACNS1 were also found in Neanderthals, suggesting that the phenotypic function that is related to this region was also expressed in Neanderthals [81]. The comparison can be extended to other human accelerated CNS regions to identify any substitutions that occurred after the divergence of humans from Neanderthals.
Conserved regions have also been presumed to play important roles in developing certain functions in humans-for instance, noncoding regions of 150 presynaptic genes in humans are highly conserved and may have critical regulatory roles in the expression of these genes [82]. Comparative analyses of these regions between different populations of modern humans and Neanderthals may give directions to a better understanding of neurodevelopmental and psychiatric disorders. Involvement of noncoding sequences in the developing brain was also revealed by another experiment involving 49 human accelerated regions (HARs); the most accelerated region in the human genome, HAR1, is part of a noncoding RNA sequence expressed in the developing brain [83]. A study by Burbano et al. [83] identified that 8.3% of HAR substitutions are not shared between modern humans and Neanderthals, but the study did not focus on HAR1. A complete Neanderthal genomic sequence should provide an understanding of the evolutionary origin of these regulatory RNAs.
Conclusion
Despite many technical challenges, the whole-genome sequence for a Homo species other than modern human is now there for the first time in just five years of the establishment of the project. It is no longer a question of the possibility of sequencing more genomes from archaic remains, and more such achievements will only strengthen our understanding of human evolution. All of our knowledge on human species-specific genetic elements thus far is based on comparison with non-human primates, which gives us information only on how humans are different from other primates, such as chimpanzee-not how modern humans excelled from archaic humans. Modern humans are considerably superior to their predecessors, and the availability of high-quality Neanderthal sequences can shed some light on the genetic basis of this phenomenon. An initial comparison between the genome sequences of these two species has given us some valuable insight, and more studies are being conducted to identify further variations between Neanderthal and humans. However, a lot remains to be done, which has been discussed in this article, albeit briefly. Even though the sequence comparison itself can not provide much confidence on the biological differences between the Homo species, the application of reverse genetics can be initiated from this point to speculate and validate the possible biological effects of such sequence differences. Once the molecular functions of such differences are identified in an in vitro analysis, they can be administered into mouse models to observe the phenotypic results of these changes. One such experiment has already been implemented by administering a human version of FOXP2 in the mouse genome to observe its effects [84]. Many behavioral and qualitative traits of humans can be understood at the molecular level using similar approaches. There has also been hype by the media over resurrecting Neanderthals by modifying the chimpanzee genome to be more Neanderthal-like [85]. But, it is still an impossible task to regenerate a species just from the genomic sequence, even if the ethical issues can be resolved [86]. However, Neanderthals do not have to be reincarnated to provide us with an enormous opportunity in the field of human genetics and evolution, as their genome is already offering a lot to better our understanding of ourselves.