†All authors and their affiliations are listed in the supplementary document.
Asian populations contain a variety of ethnic groups that have ethnically specific genetic differences. Ethnic variants may be highly relevant in disease and human differentiation studies. Here, we identified ethnically specific variants and then investigated their distribution across Asian ethnic groups. We obtained 58,960 Pan-Asian single nucleotide polymorphisms of 1,953 individuals from 72 ethnic groups of 11 Asian countries. We selected 9,306 ethnic variant single nucleotide polymorphisms (ESNPs) and 5,167 ethnic variant copy number polymorphisms (ECNPs) using the nearest shrunken centroid method. We analyzed ESNPs and ECNPs in 3 hierarchical levels: superpopulation, subpopulation, and ethnic population. We also identified ESNP- and ECNP-related genes and their features. This study represents the first attempt to identify Asian ESNP and ECNP markers, which can be used to identify genetic differences and predict disease susceptibility and drug effectiveness in Asian ethnic populations.
There has been an explosion of data describing genetic variants in humans. Structural genetic variations, such as single nucleotide polymorphisms (SNPs) and copy number variations (CNVs), have given rise to myriad differences in human populations [
SNPs are being used in studies of human migration and evolution, as well as those of human health. The Human Genome Organization (HUGO) Pan-Asian SNP Consortium reported a large-scale survey of autosomal variations from a broad geographic sample of 72 Asian human populations [
We obtained a genome-wide 58,960-SNP dataset (Affymetrix GeneChip Human Mapping 50K Xba chip; Affymetrix Inc., Santa Clara, CA, USA) from the HUGO Pan-Asian SNP Consortium website (
For comparison with other ethnic variants, we obtained SNPs of 209 HapMap individuals representing 4 populations (China Han [CHB]; Japan Japanese [JYP]; USA European [CEU]; Africa Yoruba [YRI]) (
Phasing is needed to determine which variants are inherited by an individual at each locus and to more accurately determine the relationships between unrelated individuals in large population datasets. Using phasing population data, we can identify haplotypes, which are essentially segments of DNA that are common to a particular ethnic group. The fastPHASE program version 1.1.4 [
SNP genotyping arrays recently have been used for CNP detection and analysis, because the arrays can serve dual roles for SNP- and CNV-based association studies. To detect CNP markers from the SNPs, we used the Affymetrix Genotyping Analysis Software and Copy Number Analysis Tool (CNAT version 3.0), which was downloaded from the Affymetrix website (
To investigate the distribution of ESNP and ECNP markers in Pan-Asian ethnic groups, we performed the following steps, as shown in
To investigate the characteristics of the ECNP- and ESNP-related genes, we selected the ECNP and ESNP markers ranked in the top 1% of the ethnically specific variations. We mapped ESNP and ECNP markers to gene structure, based on Entrez gene-centered information at NCBI [
We obtained 58,960 Pan-Asian SNPs from 72 Pan-Asian populations listed in the supplementary documents (
We devised an approach to identify ethnic differences from the genotype profiles of the populations. Pan-Asian SNPs were filtered using several steps, as described in Materials and Methods. With the filtered SNPs, we examined the inter-SNP distances, the allele frequency, and the heterozygosity distribution to identify the genotypic features of each ethnic group. The average inter-SNP distance was 52 kb. More than 41% of SNPs fell below 10 kb, and 14% was over 100 kb (
We investigated minor allele frequency and heterozygosity across the ethnic groups (
We confirmed the high proportion of African (YRI)-specific SNPs from 3 races: Asian, Caucasian, and African (
We obtained genes associated with ESNPs and ECNPs based on the NCBI Gene database [
Our analysis was able to identify ethnically variable SNPs associated with phenotypic changes. We selected 9,306 ESNPs and 5,167 ECNPs in 72 Pan-Asian populations. We found that representative ethnic groups with specific ESNPs are recently branched-out subpopulations, whereas representative ethnic groups with ethnic specific CNPs are early fixed subpopulations, as shown in
This research was supported by a grant from the KRIBB Research Initiative Program and by the Korean Ministry of Science, ICT & Future Planning (MSIP) under grant number 2013036118 (NRF-2011-0019745). Authors thank Ms. Kyeyoung Kim for editing the figures.
Supplementary data including two tables and six figures can be found with this article online at
Identification of Ethnically Specific Genetic Variations in Pan-Asian Ethnos
Ethnic groups list in Pan Asia (assign Ethnic group as like "two country-two ethnic group") and HapMap populations
ESNP- and ECNP-related genes list
Distribution of Pan-Asian copy number values from Pan-Asian genotype profiling. It shows that discrete distribution of Pan-Asian single nucleotide polymorphism samples with lower boundary of 1.5065 and upper boundary of 2.7765.
Process to select ethnic specific single nucleotide polymorphism (SNP). NSCM, nearest shrunken centroid method.
Inter-single nucleotide polymorphism (SNP) distance distribution. The X-axis represents the distance (kilo base pair) between SNPs and the Y-axis represents the proportion (%) of pan Asian SNPs.
Minor allele frequency (MAF) and heterozygosity (HET) distribution of single nucleotide polymorphism (SNP) in the Pan-Asian SNP data. X-axis is Pan-Asian ethnic groups, and Yaxis is SNP proportion (%) for MAF in each range. We examined MAF and HET across ethnic groups and 4 HapMap groups together. We assigned SNP proportion (%) in each range for MAF and HET rate of Pan-Asia ethnic groups and 4 HapMap groups as shown in (A) and (B).
The distribution of ethnic variant single nucleotide polymorphisms (ESNPs) across Pan-Asian and HapMap ethnic groups. We analyzed the ethnicity-specific single-nucleotide polymorphisms, including the four HapMap groups (CEU, CHB, JPT, and YRI), for the following groups: super-populations (Asians, Caucasoids, American Indians, and outliers [IN-NI, IN-TB, and CN-UG]); 12 populations; and 76 ethnic groups (Pan-Asian and four HapMap ethnic groups). AX-AI, Karitiana; AX-ME, Ami; CEU, European; PI-AE, Ayta; PI-AG, Ayta; PI-MW, Mamanwa; TH-MA, Mlabri; YRI, Yoruba; CHB, Han; JPT, Japanese; IN-NI, Mongoloid features; IN-TB, Mongoloid features; CN-UG, Uyghur.
Representative ethnic groups having ethnically specific single nucleotide polymorphismss and copy number polymorphisms on population structures. Yellow-colored row indicates the Pan-Asian ethnic groups having highly portion of ethnic variant single nucleotide polymorphisms and red-colored row indicates the Pan-Asian ethnic groups having highly portion of ethnic variant copy number polymorphisms. We marked the Pan-Asian ethnic groups based on a maximum-likelihood tree of populations. Abbreviations are explained in
Distribution of ethnic variant single nucleotide polymorphisms (A) and ethnic variant copy number polymorphisms (B) across ethnic groups. PG8 consists of Indo-European and Dravidian Southwest Asians. IN-NI, Mongoloid features; IN-TB, Mongoloid features; TH-MA, Mlabri; MY-MN, Malay; ID-ML, Malay; ID-LE, Lembata; PI-UI, Filipino; ID-DY, Dayak; PI-UN, Filipino; ID-SB, Kambera; IN-NL, Caucasoids; CN-UG, Uyghur; IN-EL, Caucasoids; IN-IL, Caucasoids; IN-SP, Caucasoids; JP-RK, Ryukyuan.
ESNP- and ECNP-related gene set summary
ESNP-related gene set |
ECNP-related gene set |
||
---|---|---|---|
Name | p-value | Name | p-value |
Disease and disorders | |||
Cardiovascular disease | 5.84E-05–4.01E-02 | Cardiovascular disease | 8.05E-05–4.70E-02 |
Developmental disorder | 5.84E-05–3.35E-02 | Neurological disease | 1.25E-04–4.98E-02 |
Connective tissue disorders | 1.25E-03–2.69E-02 | Psychological disorders | 4.17E-04–3.77E-02 |
Dermatological diseases and conditions | 1.25E-03–2.69E-02 | Connective tissue disorders | 8.93E-04–7.41E-03 |
Hereditary disorder | 1.25E-03–1.35E-02 | Developmental disorder | 8.93E-04–4.13E-02 |
Molecular and cellular functions | |||
Cellular assembly and organization | 1.94E-05–4.01E-02 | Cell-to-cell signaling and interaction | 3.58E-05–4.60E-02 |
Cellular function and maintenance | 1.94E-05–4.01E-02 | Molecular transport | 3.58E-05–3.65E-02 |
Cellular movement | 2.45E-04–3.45E-02 | Small molecular biochemistry | 3.58E-05–4.84E-02 |
Cell death and survival | 4.52E-04–4.01E-02 | Drug metabolism | 2.66E-04–3.65E-02 |
Lipid metabolism | 5.79E-04–4.01E-02 | Cellular assembly and organization | 3.89E-04–4.84E-02 |
Physiological system development and function | |||
Connective tissue development and function | 4.02E-04–2.74E-02 | Tissue morphology | 1.25E-04–4.60E-02 |
Tissue morphology | 4.28E-04–4.18E-02 | Connective tissue development and function | 2.48E-03–4.37E-02 |
Cardiovascular system development and function | 6.74E-04–4.16E-02 | Hematological system development and function | 2.48E-03–4.37E-02 |
Organ morphology | 6.74E-04–4.18E-02 | Humoral immune response | 2.48E-03–7.41E-03 |
Renal and urological system development and function | 9.40E-04–3.91E-02 | Immune cell trafficking | 2.48E-03–4.13E-02 |
ESNP, ethnic variant single nucleotide polymorphism; ECNP, ethnic variant copy number polymorphism.