Introduction
Microsatellite (MS) markers have remained a popular choice for parentage verification since two decades now. For cattle there is a standard set of nine MS markers recognized as “international marker set” recommended by international society of animal genetics (ISAG) which need to be included in the parentage testing panels to facilitate record exchange between laboratories. So far, MS markers have successfully been implemented in cattle and other livestock species. However, as the cost of single nucleotide polymorphism (SNP) genotyping has decreased, more and more new animals are being genotyped with SNP chip panels. For parentage testing all the animals are required to be genotyped with same type of markers. So either new animals should be typed with MS markers or old animals that are usually typed with MS markers should be typed with SNP markers. In both the cases, it will incur an additional cost. So in order to shift from MS to SNPs, McClure
et al. [
1,
2] suggested imputation of MS markers from SNP genotypes. This method will provide a cost effective and accurate choice for replacement of markers for parentage verification. Depending upon the relationship amongst the sampled individuals generally 2‒3 SNPs per MS are needed to obtain the accuracy good enough for genetic identification and assessment of parentage [
3]. Fernandez
et al. [
3] found that a set of 24 SNPs were equivalent to the ISAG recommended set of international MS markers.
The term “genotype imputation” refers to the prediction of missing genotypes, i.e., genotypes that were not directly genotyped in the sampled individuals [
4]. Imputation requires a reference population that has all the markers genotyped for all the samples. This reference population is then used to predict the genotypes in the target population which contains missing genotypes or missing markers. Imputation of genotypes has become a common practice in genome-wide association studies, fine mapping of QTLs, genomic predictions and whole genome based diversity studies. Since denser chips are known to perform better in the downstream analysis many laboratories use imputation to move from low density to high density SNP’s [
5]. The accuracy of imputation also depends on the density of the SnipP-chip, denser it is better predictions it would make [
6]. Ogawa
et al. [
6] found higher accuracy of imputation when they used 10,000 SNPs instead of 3,000 for genotype imputation in Japanese black cattle. Accuracy increased from 90% to 97% with the increase in number of SNPs. Several factors that affect imputation accuracies include minor allele frequency of the SNPs in the reference population, size of the reference population, genetic relationship between the reference and test populations [
7] and linkage disequilibrium between the imputed SNP and the SNP on the target data [
8].
Imputed data can provide accurate results in the downstream analysis only if the accuracy of imputation is high. In this study we report the accuracy of imputation of MS markers, from the BovineSNP50 and BovineHD BeadChip in Hanwoo cattle of Korea. Comparison between two SNP panels was made to identify the SNP panel and SNP subset that gives the best accuracy such that the overall cost of genotyping could be reduced while not having to compromise with the accuracy of prediction.
Methods
Ethics statement
For sampling individuals in this study, the standard operating procedures were reviewed and approved by the National Institute of Animal Science's Institutional Animal Care and Use Committee (Permit Number: NIAS2015-774).
Animals, genotyping, and quality control
Blood samples for genotyping were obtained from 1,482 Hanwoo individuals reared at the Hanwoo Genetic Improvement Center of the Nonghyup Agribusiness Group Inc. (Seosan, Korea). Genomic DNA was extracted from the blood samples using DNeasy 96 Blood and Tissue Kit (Qiagen, Valencia, CA, USA). DNA quantification was performed using a NanoDrop 1000 (Thermo Fisher Scientific Inc., Wilmington, DE, USA). DNA samples were submitted for genotyping with total DNA of 900 ng, 260/280 ratio >1.8, and DNA concentration of 20 ng/
μL. The SNP genotyping was done by using a BovineSNP50 BeadChip version 2 (Illumina, San Diego, CA, USA). These animals were then imputed to the BovineHD data (777k SNP chip) using another set of Hanwoo animals as reference. MS marker genotyping data for the same animals was also obtained from the Hanwoo Improvement Center of the National Agricultural Cooperative Federation (Seosan, Korea). Eight MS markers belonging to the ISAG recommended list were included in the study (
Supplementary Table 1). Markers on the sex chromosomes were ignored. PLINK version 1.9 (
http://www.cog-genomics.org/plink/1.9/) [
9] was used for the quality control of the raw genotype data. Quality control was performed on the BovineSNP50 and BovineHD BeadChip data for minor allele frequency (0.05), missingness (0.05), Hardy-Weinberg equilibrium (HWE; 0.0001) and genotyping quality (0.05). Twenty-seven hundred ninety-four SNPs were removed based on missingness, 14,190 SNPs were removed based on frequency, 2,395 markers were excluded based on HWE After quality control there were 1,482 animals and 37235 SNPs in the reference genotype dataset. All the data was split chromosome wise and SNPs within the 500 kb range on either side of the MS marker, i.e., 1,000 kb in total were extracted. Only the SNPs that were in the specified range were further used for imputations. There was no family data available to be included in the study.
Genotype imputation and estimation of imputation accuracy
Locations of the 8 MS on UMD3.1 reference genome were identified from University of California, Santa Cruz Genome Browser. The SNP data was merged with MS data and was used as reference for imputations. Out of all the animals 20% were used as validation while the rest were used as the reference animals. Beagle program [
10] was used for determining the phase and imputation of the missing markers. Beagle uses Li and Stephens haplotype frequency models to performs imputation into phased haplotypes. Imputation method used by beagle is both computationally and memory efficient [
11]. Beagle was used as it can handle both the bi-allelic and multi-allelic markers. First MS and SNP genotypes were phased independently and then the two types of datasets were merged and were phased again. This phased data was used as the reference for the imputations. A fivefold validation was performed to check the accuracy of imputation. Accuracy of imputation was measured by calculating the genotype concordance rate. Correlation between the true genotypes and the predicted genotypes were calculated. Accuracies were averaged over all five cross validation sets (
Table 1). The allelic concordance, i.e., at least one of the allele was identified correctly, was also calculated. In addition, we compared if the numbers of iterations had any effect on the accuracy of the imputation. Accuracies of imputation were compared between two SNP panels.
Results and Discussion
The number of SNPs used for imputation for the eight MS markers ranged from 9 to 24 (average 15) for BovineSNP50 and 151 to 296 for BovineHD (average 232). The number of alleles for MS markers ranged between 7 for BM1824 to 24 for TGLA227. The effective number of MS alleles varied from 3.4 in BM1824 to 8.0 in TGLA53. The observed heterozygosity varied from 0.7 in BM1824 to 1.0 in TGLA53 (
Table 2).
With BovineSNP50, the highest accuracy of 50% was recorded for TGLA122 and TGLA227 while with BovineHD most of the markers had an accuracy of 50%. The minimum imputation accuracy of 1% was observed for TGLA53 with both the SNP chip panels. TGLA53 had ~40% missing genotypes which could have attributed to the reduction in average accuracy. The genotype concordance rate averaged over all the loci was 40% for the BovineSNP50 whereas it was 43% for BovineHD (
Table 1).
The accuracy was limited by marker TGLA53. Accuracy increased to ~50% with BovineHD if TGLA53 marker was removed from the analysis. The allelic concordance of 30% and 43% with BovineSNP50 and BovineHD respectively was seen in the validation samples. The average correlation between the predicted and true genotypes was 31% and 15%, respectively with BovineSNP50 and BovineHD, respectively. Highest correlation was seen for TGLA227 and lowest in TGLA53 with BovineSNP50. With BovineHD highest correlation was seen for BM1824 and lowest for TGLA53. Accuracy of imputation is known to increase with the increase in reference population size and also by including the familial genotype data in the reference population. Also including the genotypes from the related individuals in the reference population allows the Beagle program to infer haplotypes correctly and thus make better predictions for the ungenotyped marker.
Marker density is known to affect the accuracy of imputation. Higher imputation accuracy with increased marker density has been shown by Hayes
et al. [
12]. While we did observe an increase in accuracy with the HD SNP panel, however it was not high enough to be used in routine practice. McClure
et al. [
2] observed higher accuracies as compared to our study. They used the validation animals which were derived from the reference population whereas we lacked such design in our samples. Also, no significant increase was observed in number of genotypes imputed correctly with the increase in number of iterations (
Table 3).
For the reference population to predict the MS alleles with higher accuracies we need multiple generations of ancestors genotypes along with the pedigree information. For imputing MS markers from SNP data we suggest using related animals. Such studies need to be optimized well before they could be used in routine practice.
Acknowledgments
This work was supported by Agenda (PJ01134902) of the National Institute of Animal Science. We acknowledge the support from different institutions and their personnel providing help for the sampling of cattle (Hanwoo Genetic Improvement Center of the Nonghyup Agribusiness Group Inc.) in Seosan, Chungnam province, Korea and cattle keepers for their assistance and permission to sample their herds. Funders had no role in design of the study, sampling, analysis or writing of the manuscript.
Table 1.
Accuracy of imputation of MS markers from Bovine 50K beadchip and HD SNP chip data in Hanwoo cattle averaged over five cross validation sets
Marker |
Chromosome |
50K
|
777K
|
No. of SNPsa
|
Genotype concordance |
Alleleb
|
Correlationc
|
No. of SNPs |
Genotype concordance |
Alleleb
|
Correlationc
|
BM1824 |
Chr1 |
14 |
0.4 |
0.34 |
0.4 |
151 |
0.5 |
0.19 |
0.52 |
BM2113 |
Chr2 |
24 |
0.4 |
0.3 |
0.32 |
248 |
0.4 |
0.27 |
0.36 |
ETH10 |
Chr5 |
16 |
0.4 |
0.24 |
0.4 |
256 |
0.5 |
0.12 |
0.42 |
ETH225 |
Chr9 |
9 |
0.4 |
0.4 |
0.2 |
159 |
0.55 |
0.12 |
0.39 |
TGLA53 |
Chr16 |
9 |
0.01 |
0.12 |
0.04 |
75 |
0.01 |
0.11 |
0.02 |
TGLA227 |
Chr18 |
17 |
0.5 |
0.12 |
0.5 |
296 |
0.5 |
0.09 |
0.47 |
TGLA126 |
Chr20 |
12 |
0.4 |
0.4 |
0.22 |
243 |
0.5 |
0.21 |
0.32 |
TGLA122 |
Chr21 |
16 |
0.5 |
0.11 |
0.43 |
268 |
0.51 |
0.07 |
0.49 |
Average |
|
15 |
0.40 |
0.30 |
0.31 |
212 |
0.43 |
0.15 |
0.38 |
Table 2.
Details of Microsatellite markers for the total 1,482 animals
Locus |
Na |
Ne |
Ho |
BM1824 |
7 |
3.432 |
0.699 |
BM2113 |
12 |
3.652 |
0.727 |
ETH10 |
10 |
4.835 |
0.892 |
ETH225 |
11 |
6.563 |
0.99 |
TGLA53 |
15 |
7.608 |
0.955 |
TGLA227 |
24 |
4.979 |
0.999 |
TGLA126 |
19 |
5.098 |
0.838 |
TGLA122 |
11 |
3.732 |
0.811 |
Table 3.
Effect of iterations on the genotype imputation accuracy based on BovineHD SNP panel
Iteration |
Average |
Max |
Min |
100 |
0.40 |
0.50 |
0.02 |
200 |
0.40 |
0.50 |
0.02 |
300 |
0.40 |
0.50 |
0.02 |
400 |
0.40 |
0.50 |
0.02 |
500 |
0.40 |
0.50 |
0.02 |
References
2. McClure MC, Sonstegard TS, Wiggans GR, Van Eenennaam AL, Weber KL, Penedo CT,
et al. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds. Front Genet 2013;4:176.
6. Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Takasuga A, Sugimoto Y,
et al. Accuracy of imputation of single nucleotide polymorphism marker genotypes from low-density panels in Japanese Black cattle. Anim Sci J 2016;87:3–12.
7. Uemoto Y, Sasaki S, Sugimoto Y, Watanabe T. Accuracy of high-density genotype imputation in Japanese Black cattle. Anim Genet 2015;46:388–394.
8. Calus MP, Bouwman AC, Hickey JM, Veerkamp RF, Mulder HA. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 2014;8:1743–1753.
12. Hayes BJ, Bowman PJ, Daetwyler HD, Kijas JW, van der Werf JH. Accuracy of genotype imputation in sheep breeds. Anim Genet 2012;43:72–80.