Allelic Frequencies of 20 Visible Phenotype Variants in the Korean Population
Article information
Abstract
The prediction of externally visible characteristics from DNA has been studied for forensic genetics over the last few years. Externally visible characteristics include hair, skin, and eye color, height, and facial morphology, which have high heritability. Recent studies using genome-wide association analysis have identified genes and variations that correlate with human visible phenotypes and developed phenotype prediction programs. However, most prediction models were constructed and validated based on genotype and phenotype information on Europeans. Therefore, we need to validate prediction models in diverse ethnic populations. In this study, we selected potentially useful variations for forensic science that are associated with hair and eye color, iris pattern, and facial morphology, based on previous studies, and analyzed their frequencies in 1,920 Koreans. Among 20 single nucleotide polymorphisms (SNPs), 10 SNPs were polymorphic, 6 SNPs were very rare (minor allele frequency < 0.005), and 4 SNPs were monomorphic in the Korean population. Even though the usability of these SNPs should be verified by an association study in Koreans, this study provides 10 potential SNP markers for forensic science for externally visible characteristics in the Korean population.
Introduction
DNA genotyping has been used in forensic science for over 30 years for human identification by using short tandem repeat (STR) markers, which are highly polymorphic variants [1]. Recently, single nucleotide polymorphisms (SNPs) have also been used for forensic marker, because they may provide better profiling from degraded DNA samples than STRs and do not involve repetitive sequences that may make stutter artifacts, which complicates STR profiling interpretation [2, 3]. SNPs for forensic analyses can be used for human identification, kinship analysis, inferring biogeographic ancestry, and estimating appearance traits [4].
In general, forensic DNA profiling is used for identifying persons compared with profiles of known suspects or missing persons included in a DNA database. However, in the case of absent information on felons or missing persons in the DNA profile database, it is helpful to predict unknown persons with DNA markers [5]. The prediction of externally visible characteristics (EVCs) from DNA has been studied for forensic genetics over the last few years [5-7]. The EVCs include hair, skin and eye color, height, and facial morphology, which have high heritability [8-10].
The genome-wide association study (GWAS), a linkage analysis and candidate gene study, has been used to identify genetic variants influencing such EVCs. Variations in the MC1R gene have been associated with red hair [11]. The red hair prediction method, based on a combination of non-synonymous SNPs in MC1R, was already developed for forensic science more than 10 years ago [12], and its accuracy was 84% in the prediction of red-haired individuals. The SNPs in the OCA2 locus and the adjacent HERC2 gene in Europeans showed a strong correlation with blue and brown eye color [13, 14]. Other genes were also identified as contributing to hair, eye, and skin color variations, such as IRF4, KITLG, SLC24A4, SLC45A2, TYRP1, TYP, and ASIP [15].
Recent studies using genome-wide association analysis have identified genes and variations correlating with human visible phenotypes and developed a phenotype prediction program [16-19]. HIrisPlex is capable of predicting hair and eye color from DNA with 24 SNP variations [19]. Six of the 24 SNPs used for eye color prediction in the HIrisPlex program are rs12913832 (HERC2), rs1800407 (OCA2), rs12896399 (SLC24A4), rs16891982 (SLC45A2 [MATP]), rs1393350 (TYR), and rs12203592 (IRF4), and 4 of these SNPs within HERC2, OCA2, SLC45A2, and IRF4 are also used for hair color prediction. To predict hair color and hair color shade, HIrisPlex uses 22 SNPs as follows: N29insA, rs11547464, rs885479, rs1805008, rs1805005, rs1805006, rs1805007, rs1805009, Y152OCH, rs2228479, and rs1110400 from the MC1R gene, rs28777 (SLC45A2 [MATP]), rs16891982 (SLC45A2 [MATP]), rs12821256 (KITLG), rs4959270 (EXOC2), rs12203592 (IRF4), rs1042602 (TYR), rs1800407 (OCA2), rs2402130 (SLC24A4), rs12913832 (HERC2), rs2378249 (PIGU/ASIP), and rs683 (TYRP1). The results of these 24 SNPs when their minor alleles are input into HIrisPlex are used to predict the color of an individual with the highest probability of the 3 eye color categories (brown, blue, or intermediate) and 4 hair color categories (blond, brown, red, and black). On average, their prediction accuracy was 69.5% for blonde hair, 78.5% for brown, 80% for red, 87.5% for black hair, and over 90% for blue and brown eyes.
Few GWASs on color-related phenotype have been carried out in Asian populations [20, 21]. Therefore, most prediction models for EVCs have been constructed and validated based on genotype and phenotype information on Europeans; these prediction models must be validated in diverse ethnic populations [16-19]. In this study, we selected potentially useful variations for forensic science that are associated with hair and eye color, iris pattern, and facial morphology, based on previous studies, and analyzed their frequencies in the Korean population.
Methods
Population and DNA extraction
The subjects in this study were collected from the Korea Association Resource (KARE) study, which has been described in detail previously [22]. Briefly, KARE consists of 2 community-based cohorts, Ansan (urban community) and Ansung (rural community) in Korea, and includes 10,038 participants aged 40 to 69 years. Genomic DNA of 1,920 male subjects randomly selected from the KARE study was extracted from cell lines immortalized with Epstein Bar virus.
SNP selection
Using the NHGRI GWAS catalog (http://www.genome.gov/gwastudies) [23], 9 articles were identified with the query terms "hair," "eye," "iris," and "facial morphology," and 26 SNPs with association p-values < 5 × 10-8 (GWAS p-value) were selected from the articles [10, 24-31]. The most significant SNPs were selected from the SNPs that were under pairwise linkage disequilibrium (LD) (r2 > 0.8).
Genotyping and quality control
Genotyping was performed at a multiplex level using the Illumina Golden Gate genotyping system [32]. The genotype quality score for retaining data was set to 0.1. SNPs that could not satisfy the following criteria were excluded: 1) a minimum call rate of 90%, 2) no duplication error, and 3) Hardy-Weinberg equilibrium greater than p > 0.001. All 20 SNPs were successfully genotyped.
Results and Discussion
Using the GWAS catalog, we extracted 26 candidate SNPs that might be useful for forensic science in terms of EVCs. Two SNPs in MC1R and HERC2 were already genotyped in subjects of the KARE study. Among 26 SNPs, 2 pairs of SNPs, rs1847134 and rs1393350 within TYR and rs4900109 and rs12896399 within SLC24A4, respectively, were under pairwise LD (r2 > 0.8), and 1 SNP in each pair was selected. Two SNPs showing low probe design ability were excluded from the probe selection process. Finally, 20 out of 26 SNPs were selected for genotyping in Koreans.
One SNP, rs7559271 (PAX3), was associated with facial morphology [31]; 2 SNPs, rs3739070 (TRAF3IP1) and rs10235789 (SEMA3A), were identified for the association with iris patterns, such as furrows and crypts, respectively [29]; and the remaining 17 SNPs were associated with hair and/or eye color [10, 24-28, 30]. Among these variants, 8 SNPs were included in the HIrisPlex program the determine hair and iris color.
Allele frequencies and other genetic parameters for the 20 SNPs are provided in Table 1. All analyzed SNPs were in Hardy-Weinberg equilibrium in the Korean population (p > 0.001). Three SNPs that were associated with facial morphology or iris characteristics in Europeans [29, 31] were polymorphic in the Korean population as follows: rs7559271 (PAX3), rs3739070 (TRAF3IP1), and rs10235789 (SEMA3A). Their minor allele frequencies (MAFs) were 0.303, 0.029, and 0.103, respectively. Among 17 SNPs related to hair or eye color, 4 SNPs were monomorphic and 6 SNPs were very rare (MAF < 0.005) in Koreans. The 4 monomorphic SNPs in Koreans were rs12203592 (IRF4), rs12821256 (KITLG), rs12913832 (HERC2), and rs8033165 (intergenic). Their allele frequencies in Europeans (CEU) in the HapMap database were 0.167, 0.146, 0.792, and 0.447, respectively. The 6 rare SNPs in Koreans were rs16891982 (SLC45A2), rs1408799 (TYRP1), rs35264875 (TPCN2), rs1393350 (TYR), and rs1805007 (MC1R), and their allele frequencies in Europeans were 0.983, 0.695, 0.175, 0.226, and 0.121, respectively. The HapMap database did not include the allele frequency of rs12931267 (MC1R).
Since frequency information alone does not explain phenotype variability, association analysis is required to confirm the genetic effect of the SNPs between appearance phenotype and SNPs. Stokowski et al. [20] reported that SNPs in TYR, SLC45A2, and SLC24A5 were associated with skin pigmentation and largely accounted for differences between those with the darkest and lightest skin in a South Asian sample. A previous study analyzed skin and hair color in Koreans, categorizing hair color into 3 types and skin color into 4 types [33]. The 7 polymorphic SNPs in the Korean population are likely to contribute to color variation within the Korean population. In addition, the 4 monomorphic SNPs could be used for distinguishing East Asians from Europeans.
In this study, we provided the allele frequency of 20 EVCs SNPs by genotyping a large number of Koreans (1,920 individuals) in comparison to the 45 Chinese and 45 Japanese recruited for the International HapMap Project. Even though the usability of these SNPs should be verified by association study in Koreans, this study might provide 10 potential SNP markers for forensic science for EVCs in the Korean population.
Acknowledgments
This work was supported by a 2012 forensic science research project of the Supreme Prosecutors' Office, Republic of Korea.