Best linear unbiased prediction (BLUP) has been used to estimate the fixed effects and random effects of complex traits. Traditionally, genomic relationship matrix-based (GRM) and random marker-based BLUP analyses are prevalent to estimate the genetic values of complex traits. We used three methods: GRM-based prediction (G-BLUP), random marker-based prediction using an identity matrix (so-called single-nucleotide polymorphism [SNP]-BLUP), and SNP-SNP variance-covariance matrix (so-called SNP-GBLUP). We used 35,675 SNPs and R package "rrBLUP" for the BLUP analysis. The SNP-SNP relationship matrix was calculated using the GRM and Sherman-Morrison-Woodbury lemma. The SNP-GBLUP result was very similar to G-BLUP in the prediction of genetic values. However, there were many discrepancies between SNP-BLUP and the other two BLUPs. SNP-GBLUP has the merit to be able to predict genetic values through SNP effects.
Many important human traits are moderately to highly heritable [
We used the BLUP to estimate the numerical values of genetic factors that reside in an individual's DNA information. The expression of diverse and complicated genetic factors can lead to phenotypic quantitative values, and the genetic factors can translate into numerical values. SNPs can be the representatives of genetic factors. Therefore, the invisible but real SNP effects can be translated into numerical values.
BLUP, including best linear unbiased estimation (BLUE), is a standard method for predicting the random effects and fixed effects of a mixed model [
Here,
We used three methods: GRM-based BLUP (G-BLUP), SNP-BLUP, and SNP-GBLUP. G-BLUP is a GRM-based method. SNP-BLUP is assumed to be IID between SNPs [
Height in humans is a classical quantitative trait, is easy to measure, and has high heritability. The heritability of height has been estimated to be ~0.8 [
We used the Ansan-Anseong cohort dataset in Korea. This dataset was established for a Korean chronic diseases study, with Ansan and Anseong representing urban and rural areas of Korea, respectively. The subjects were men between 40-69 years of age who had been residents of the region for at least 6 months. The basic survey was conducted from 2000 to 2001, and our study was based on the 3rd Ansan-Anseong cohort dataset version 2.1. We chose height as the phenotypic data and sex as a fixed effect. The SNPs of the cohort dataset were implemented using Affymetrix Genome-wide Human SNP Array 5.0 (Affymetrix, Santa Clara, CA, USA). The mean call rate was 99.01%, and the genetic analysis result, proved by SNPstream UHT 12 plex, was 99.934%. The total number of genotyped SNPs was 352,228, and they were filtered using the conditions of minor allele frequency (MAF; <0.01), Hardy-Weinberg equilibrium (<0.0001), and missing genotyping (missing >0.2). The SNPs were pruned using PLINK (
The GRM was calculated by using R package "rrBLUP" with the option "Expectation-Maximization (EM) algorithm." Then, using the restricted maximum likelihood method (REML) of the same packages, we calculated the SNP effects, genetic values, error variance, and genetic value variance. The EM imputation algorithm was used for the GRM, because we dealt with high-density SNPs. The REML method is used in a rather small size sample instead of the ML method.
From the regression Eq. (1), Henderson [
Then, we used
From Eqs. (3) and (4),
Eqs. (5) and (6) represent the probability density function (pdf) of the BLUP model (Eq. 1). Eqs. (5) and (6) are equivalent, and we can easily know that
We used the "rrBLUP" package for calculating the G matrix and predicting the genetic values of men's height complex traits. "rrBLUP" uses a practically generalized least squares (GLS) solution for the BLUP. For the G-BLUP, we used the G matrix, which was calculated from the R "rrBLUP" package using information on 35,675 SNPs. For the random effects estimation, such as SNP effects, we used two methods. One was SNP-BLUP, which is assumed to be IID between SNPs, and the other was SNP-GBLUP, which uses the statistical SNP-SNP variance covariance matrix. We will call this the SNP-SNP relationship matrix.
From the relationship
Calculating the SNP-SNP relationship matrix
The result of the genetic value prediction is shown in
We estimated the narrow-sense heritability (h2) using simple regression between the genetic values and the height phenotypic values. The heritability was 0.24 in G-BLUP and SNP-GBLUP and 0.20 in SNP-BLUP. According to Yang et al.'s article [
The G-BLUP uses the GRM. This matrix contains the information between individuals. However, SNP-BLUP and SNP-GBLUP use the direct information between SNPs. SNP-BLUP is assumed to be IID between SNPs. This IID assumption is good but not accurate, because it ignores the interaction terms between SNPs. Instead, SNP-GBLUP uses the covariance structures directly. This BLUP is more proximal to actuality than SNP-BLUP.
The covariance terms of the SNP-SNP relationship matrix can be used as interaction terms in phenotype-related analysis. It can provide the scientist with clues for SNP-SNP interactions. The covariance terms of it are independent of the phenotypic values, because they came from the Z matrix and GRM. However, after proper constant multiplication, as in BLUP, they may be interpreted as interaction terms between SNPs.
Probably, the SNP effects through SNP-GBLUP rather than SNP-BLUP can be used in Bayesian BLUP. Bayesian BLUP excludes the low or zero effects of SNPs. The accurately calculated SNP effects through SNP-GBLUP can be classified into low-effect SNPs and high-effect SNPs as absolute values. Low-effect SNPs can be excluded, and Bayesian BLUP can exploit only high-effect SNPs [
Narrow-sense heritability (h2) reflects the additive effects of QTL. The estimated heritability from our data was smaller than the generally accepted heritability. This is because causal variants were not in complete LD with the SNPs that were genotyped. Incomplete LD might occur if causal variants have a lower MAF than genotyped. The effects of the SNPs are treated statistically as random, and the SNPs have a small effect on the trait [
The present studies aim at predicting genetic values using the covariance structures between SNPs via GRM [
Genetic value estimation combines the performance and kinship information, which is based on a known pedigree [
We thank Dong-Hyun Shin and Min-Su Park for their help and Professor Hee-Seok Oh for helpful advice. The work was supported by the Project (PJ009260) of the National Livestock Institute of the Rural Development Administration, Republic of Korea. And this research was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Ministry of Science, ICT & Future Planning (M3A9D1054622). We are also grateful to the provider of the KARE3 Ansan-Ansung cohort data.
Supplementary data including two tables can be found with this article at
This table shows the predicted genetic values of the 1st-500th human individuals according to the BLUP methods
The table indicates the SNP-SNP relationship matrix of the 1st-8th SNPs using the relationship of
The histogram of genetic value variances of G-BLUP (A), SNP-BLUP (B), and SNP-GBLUP (C). The shapes of the histograms of G-BLUP and SNP-GBLUP were nearly identical. However, the shape of the histogram of SNP-BLUP was disparate from the other two BLUPs. G-BLUP, genomic relationship matrix-based best linear unbiased prediction; SNP-BLUP, single nuleotide polymorphism (SNP)-best linear unbiased predictor; SNP-GBLUP, SNP-genomic linear unbiased prediction.
The histogram of the SNP effects of SNP-BLUP (A) and SNP-GBLUP (B). They were approximately distributed normally. However, the predicted genetic values were very dissimilar. SNP-BLUP, single nuleotide polymorphism (SNP)-best linear unbiased predictor; SNP-GBLUP, SNP-genomic linear unbiased prediction.
The histogram of the genetic values of G-BLUP (A), SNP-BLUP (B), and SNP-GBLUP (C). The genetic value distribution was similar to the normal distribution. G-BLUP, genomic relationship matrix-based best linear unbiased prediction; SNP-BLUP, single nuleotide polymorphism (SNP)-best linear unbiased predictor; SNP-GBLUP, SNP-genomic linear unbiased prediction.