Meta-analysis using trajectory model
To discover SNPs associated with the overall trend and trajectory of phenotypes, we prepared two different longitudinal outcomes (the average and trajectory), which were obtained from longitudinal BMI, HIP, WST, and WHR measurements (baseline to sixth follow-up). To assess the average outcome, we calculated the average BMI, HIP, WST, and WHR measurements over the six time points and then used the average values as the phenotype for GWAS analysis. To assess the trajectory outcome, we utilized the methodology in Gouveia et al. [
13] designed to study cognitive trajectories. The LMM function implemented in the R package lme4 [
21] was used to obtain the per-individual trajectory for all obesity-related traits. Specifically, we fitted the following mixed model for subject
i at time
j:
yij=(β0+u0i)+(β1+u1i)x1ij+∑h=2qβhxhij+eij (i=1,…,n,j=1,…,6) where
yij is a phenotypic value;
u0i is a random effect allowing variation around the intercept
β0;;
u1i is a random effect allowing variation around the slope for age covariate,
β1;
xhij (
h = 1, …,
q) are
q covariates including age (
h = 1) with fixed effects
βh; and
eij is a random error. We assumed that the random intercept (
u0i) and the random slope (
u1i) were independent of each other and normally distributed. The models were adjusted for age squared, sex, drinking, and smoking status. We then used the predicted random slope (
u1i) from the model as the phenotype for a GWAS analysis of trajectory outcomes. All GWAS results for average and trajectory outcomes were obtained using BOLT-LMM software. In the GWAS for trajectory outcomes, the models were adjusted for only 10 genotype PCs to avoid over-adjustment.
Grid-based Bayesian mixed model
To further discover SNPs associated with traits of interest and SNPs interacting with time/environmental factors, we considered the following Bayesian mixed model:
where yi = (yi1, …, yini)T is an ni × 1 phenotype vector of individual i; μi = μ1ni is an ni × 1 overall mean vector;
xi=(xit,xig,xigg,xigt) is the design matrix corresponding to q time/environmental covariates, p SNPs, two-way interactions among p SNPs (resulting in total of p(p–1)/2 terms), and pq SNP-time/SNP-environment interactions; β = (βtT, βgT, βggT, βgtT)T is a vector of genetic effects, time/environmental effects, epistasis effects, and SNP-time/environment interactions; ei is an ni × 1 vector of random errors with ei ~ N (0, σ2Ini). For a given trait, we had n individuals, where individual i has phenotypic values measured at ni time points (i = 1, …, n), and the total number of observations was
N=∑i=1nni. To model the correlation among the repeated measurements of the same individual, we partitioned the observed time interval by k pre-specified grid points, t = (t1, …, tk)T, and defined νi as a k × 1 vector of random effects at the grid time points with νi ~ N(0, D), where D is a k × k covariance matrix. We also defined the incidence matrix pi as
pi=(pi1T,…,piniT)T. If all subjects had k observations measured exactly on the k grid time points, then pi became an identity matrix. We applied a linear interpolation procedure to any observation that did not fall on any one of k grid time points. When the jth measurement of individual i fell at time t, which was in between the grid points tr and tr+1 (tr ≤ t ≤ tr+1), we set
pij=(0(r-1)T,tr+1-ttr+1-tr,t-trtr+1-tr,0(k-r-1)T). When t = tr, we get
pij=(0(r-1)T,1,0(k-r)T).
For Bayesian estimation of the mixed-effects model (
1), we applied the modified Cholesky decomposition of Chen and Dunson [
25] to the
k ×
k covariance matrix
D, resulting in the decomposition,
D =
ΔΨΨTΔ, where
Δ is a nonnegative
k ×
k diagonal matrix and
Ψ is a
k ×
k lower triangular with 1’s in the diagonal elements. We re-parameterized model (
1) as
where bi = (bi1, … bik)T such that bij ~ N(0,1) and
bij⊥bij′ (j≠j′), j = 1, …, k.
As priors for Δ and Ψ, we defined two vectors δ = (δl : l = 1, …, k)T and ψ = (ψml : m = 2, …, k; l = 1, …, m − 1)T. The prior distribution for δ is
P(δ)=∏l=1kP(δl)=∏l=1kN+(δl∣ml0,sl02) where
N+(δl∣ml0,sl02) was the density of a half normal distribution that was a
N(δl∣ml0,sl02) density truncated below by zero. The prior distribution for ψ was
P(ψ)=dN(ψ0,R0) where ψ0 and R0 are pre-specified hyperparameters. The joint prior distribution of
b=(b1T,…,bnT)T was
P(b)=dN(0,Ink). The prior for the ath genetic effect was a normal distribution,
P(βa∣γa,σβ2)=dN(0,γaσβ2) and the prior for the variance
σβ2 was a scaled inverse χ2 distribution. The prior for the overall mean μ was given by
P(ψ)=dN(η0,τ02). The prior for the residual variance σ2 was chosen as a scaled inverse χ2 distribution,
P(σ2)=dinv-χ2(νσ,sσ2). The joint posterior distribution was proportional to the product of the likelihood and the prior distributions of all unknown parameters. We utilized Metropolis-Hastings and Gibbs sampling algorithms, and alternately updated each unknown parameter or set of unknown parameters conditional on all the other parameters and the observed data.
The posterior samples can be used to approximate the posterior distribution of the parameters. The posterior inclusion probability of each SNP was calculated using its inclusion proportion in the Markov chain Monte Carlo samples. The Bayes factor (BF) can be calculated to quantify the evidence for the inclusion of a specific SNP against the exclusion of that SNP. Jeffreys [
26] and Yandell et al. [
27] suggested the following criteria for judging the significance of each SNP: weak support if the BF falls between 3 and 10; moderate support if the BF falls between 10 and 30; and strong support if the BF is larger than 30 (i.e., log
2(BF) > 6.8). A critical issue with the proposed Bayesian model was how to choose an optimal number of grid points,
k. We achieved the goal by evaluating the goodness of the predictive distributions of our Bayesian models. Spiegelhalter et al. [
11] proposed the DIC and Ando [
12] developed the BPIC. We select the optimal number of grid points for our model by minimizing the DIC or simplified BPIC scores. The proposed methods were implemented in an R package named Gridbayes [
10], which is available for download at
https://github.com/wonilchung/GridBayes.