### Introduction

*p*is much greater than sample size

*n*. A classical GWAS conducts several association tests, so called multiple testing, that examine an individual effect of each SNP on a clinical outcome. The classical GWAS has two major limitations. First, GWAS has a multiple testing issue, which requires adequate control of false positives. Typically, the significance level of each test is adjusted by a Bonferroni correction. The significance level widely accepted to determine “genome-wide significant” association is 5 × 10

^{–8}[1,2], which is a strict threshold and makes genome-wide significance difficult to be achieved. Second, the GWAS does not account for the intricate group structure among SNPs such as genes or linkage disequilibrium (LD) blocks. LD reflects how much an allele from a particular genetic variant is associated or inherited with an allele from another nearby genetic variant within the same population [3]. Incorporating the group information within the GWAS would increase statistical power by aggregating small effects of SNPs within a group.

### Methods

### Accelerated failure time model

*i*-th subject,

*Y*is survival time and

_{i}*= (1,*

**X**_{i}*x*

_{i1},

*x*

_{i2}, ⋯,

*x*)' is a covariate vector. The first element of 1 allows estimation of the

_{ip}*y*-intercept. Subsequently, AFT model is given by

*= (*

**β***β*

_{0},

*β*

_{1}, ⋯,

*β*) is a vector of

_{p}*p*+ 1 unknown regression coefficients including the

*y*-intercept of

*β*

_{0}, and

*ϵ*= log

_{i}*v*is an error term. Generally, the error term is assumed to follow the parametric distribution, such as normal distribution. The parametric AFT model is discussed extensively in [23-26] of the frequentist framework. Bayesian approaches were developed for the parametric AFT model [27-29]. In this paper, we consider a parametric Bayesian approach to model the error term

_{i}*ϵ*with a normal distribution.

_{i}### BBVS in accelerated failure time model

*G*blocks we can write our model as follows.

*g*-th group of variables,

*= (*

**β***β*

_{1}, ⋯,

*β*),

_{G}*= (*

**γ***γ*

_{1}, ⋯,

*γ*), where

_{G}*γ*is an indicator variable having 0 or 1. When

_{g}*γ*= 1, the

_{g}*g*-th set of variables will be included in the model. If

*γ*= 0, we remove the

_{g}*g*-th group in the model construction. The covariates

*ϵ*’s are assumed to be independently distributed as

_{i}*N*(0,

*σ*

^{2}); hence, that the failure time

*Y*follows a log-Normal distribution. When

_{i}*y*is possibly right censored, we only observe

_{i}*t*=

_{i}*min*(

*Y*,

_{i}*c*) and

_{i}*ν*=

_{i}*I*{

*y*<

_{i}*c*}, where

_{i}*c*is the censoring time. Here

_{i}*w*=

_{i}*log*(

*y*) can be considered as the augmented data such that

_{i}*y*. In the first step, we identify important groups of variables by updating only the group inclusion vector

_{i}*γ*and the censored time

*y*from their posterior distributions. In the second step, the model (1) can be reduced by

_{i}*Q*selected groups in the first step, and

*,*

**θ**_{g}*g*= 1, 2, ⋯,

*Q*are the corresponding regression coefficient vectors. The censored time to event

*y*is imputed by the mean of the posterior samples of

_{i}*w*collected in the first step. It converts the AFT model to a usual log-linear regression problem.

_{i}*to enable the element-wise variable selection within*

**θ**_{g}

**β**_{0},

*and the standard deviation*

**β***σ*of the error term are not of interests, the computational burden in the first step can be reduced by integrating out the irrelevant parameters,

**β**_{0},

*,*

**β***σ*from the full posterior distribution. This kind of strategy has been employed in Sha et al. [20], although their variable selection has been conducted only in an element-wise fashion.

* The first step: group-wise variable selection*

*k*≤

_{g}*n*and

*k*>

_{g}*n*for the

*g*-th group with size

*k*. The prior on

_{g}*is the Information Matrix (IM) or Information Matrix Ridge prior proposed by Gupta and Ibrahim [30]. It is a generalization of Zellner's g-prior [31], while the IM prior is equal to the Zellner's g-prior in the Gaussian linear regression setting. The full posterior distribution of (*

**β**_{g}*,*

**β**_{g}*,*

**β***γ*,

*σ*

^{2}) is given by

**β**_{0},

*,*

**β***σ*

^{2}, we can obtain the posterior distribution of

*:*

**γ***(g) = (*

**γ***γ*

_{1}, ⋯,

*γ*

_{g-1},

*γ*

_{g+1}, ⋯,

*γ*), the posterior distribution of

_{G}*γ*is the Bernoulli distribution with success probability

_{g}*f*(∙|

_{t}*ν*,

*σ*

^{2}) denotes the probability density function of t-distribution with the degrees of freedom

*ν*and the scale parameter

*σ*

^{2}. Then, update p

_{g}from its posterior distribution Beta (

*a*+

*γ*,

_{g}*b*+ 1 -

*γ*). The marginal likelihood of the augmented data

_{g}*w*can be derived as

*n*-dimensional multivariate t-distribution with truncation given by (2) as follows.

*w*for a censored case

_{i}*ν*= 0, the censored time

_{i}*w*can be imputed by its posterior mean. Denote

_{i}*h*is a scalar element in

_{i,j}*i*-th row,

*j*-th column of

*H*and

_{γ}*H*

_{(i,j)}is the matrix

*H*without its

_{γ}*i*-th row and

*j*-th column, and

*i*-th row of

*H*without its

_{γ}*i*-th element. Similarly, let

**w**^{(i)}be the vector

*without its*

**w***i*-th element. When

*w*is censored, its full conditional distribution can be written as a truncated

_{i}*t*location-scale distribution such that

*μ*,

_{wi}*s*, and

_{wi}*n*+

*ν*

_{0}-1 are respectively the location, scale, and degrees of freedom parameters. The location and scale parameters are given by

*w*will be updated from (4) at each iteration and it will be imputed as their posterior mean in the element-wise selection step.

_{i}*M*iterations, posterior inclusion probability can be calculated from the posterior sample of

*γ*as their posterior mean,

*g*-th group is "decided" to be included in the model. To select important groups, for some threshold

*p*

^{*}, we consider that any group with

*p*

^{*}to control the average Bayesian FDR by using the method proposed by Morris et al. [34].

* The second step: element-wise variable selection*

*Q*selected groups in the first step and assume shrinkage priors on the regression parameters

**θ**_{1}, ⋯,

*to achieve further sparsity in the element-wise level in the reduced model (3). As a shrinkage prior, the DL prior is assumed and extended to incorporate grouping information. The DL prior has been proposed in [22] as a novel form of shrinkage prior. Under the normal means setting*

**θ**_{Q}*θ*has a DL prior, which has a hierarchical structure such that

_{i}*= (*

**ϕ***ϕ*

_{1},

*ϕ*

_{2}, ⋯,

*ϕ*). To efficiently control the global shrinkage, they introduced global (

_{p}*τ*) and local (

*) scales, where the local scales have a joint structure such that they lie in the (*

**ϕ***p*-1) dimensional simplex. Under the moderate-sized coefficients with sparse signal setting, their simulation study has shown that the DL prior outperforms least absolute shrinkage and selection operator (Lasso), Bayesian Lasso, empirical Bayes median, and point mass prior, while its performance is similar to that of Horseshoe prior.

*ψ*,

_{j}*ϕ*, and

_{j}*τ*in (5) to be group index(

*g*)-dependent. In the selected group

*g*, there are

*q*variables and the total number of selected variables in the model (3) is

_{g}*w*by the posterior mean

*g*= 1, 2, ⋯,

*Q*, the priors are set to be

##### (6)

*IG*(

*α,b*) denotes the inverse gamma distribution with shape parameter

*α*and the rate parameter

*b*.

*= (*

**τ***τ*

_{1}, ⋯,

*τ*). The design matrix is given by

_{Q}*g*, and

*Σ*

^{*}is a block diagonal matrix with element matrices

##### (7)

*B*(

*ϕ*) denotes a multivariate Beta function. We propose a Gibbs sampler for posterior computation, which enables parameter estimation and variable selection simultaneously. The Gibbs sampler is computationally efficient and mixes rapidly. We first specified the hyperparameters

_{g}*h*

_{0},

*σ*

_{0},

*ν*

_{0},

*a*

_{1}, ⋯,

*a*at appropriate values. Starting from the initiation step, the Gibbs sampler for the model (3) and (7) proceeds as follows:

_{g}*β*

_{0}according to its full conditional distribution

*θ*from its full conditional distribution

_{g}*N*=

*n*+

*q*+

*p*

_{0}+

*ν*

_{0}and

*σ*

^{2}from

*ψ*from its full conditional distribution

_{gj}*τ*from its full conditional distribution, the generalized inverse Gaussian distribution (giG), such as

_{g}*ϕ*, where

_{gj}*ϕ*=

_{gj}*T*/

_{gj}*T*such that

_{g}*a*from MN(

_{g}*θ*|'s can exist, where (a) one cluster has nearly zero coefficient values while (b) another cluster has relatively bigger absolute coefficients away from zero. The clusters (a) and (b) can be considered as noise and signal, respectively. We cluster |

_{j}*θ*|'s at each Markov chain Monte Carlo iteration using k-means with

_{j}*k*= 2 clusters. At each

*i*-th iteration, the number of important variables

*h*is set to be the smaller cluster size out of the two clusters. Subsequently, the number of important variables is finally estimated by taking the mode from the whole Markov chain Monte Carlo (MCMC) iterations, i.e.,

_{i}*H*= mode{

*h*}. The

_{i}*H*largest elements of the absolute values of posterior medians |

*θ*| are identified as the important variables.

### ADNI-1 data

^{-6}. The remaining missing genotype variables were imputed as the modal value. After the QC procedures, 347 subjects and 494,564 SNPs remained in the current study. The above procedures were carried out in PLINK version 1.9. We also calculated the LD blocks to form the SNP-sets and remove SNP-sets with a single SNP. Eventually, 421,823 SNPs were left in our analysis grouped into 16,084 SNP-sets.

### Simulation data

*N*(-1, 0.5), which mimics the situation wherein a single copy of the minor allele decreases the time to event in relation to major allele. In the censored case, the censored event times were independently generated from a uniform distribution from 0 to

*c*

^{*}. The value of

*c*

^{*}was set to achieve a desired censoring rate. We replicated the simulation 50 times under the same setting. We assumed the inclusion indicator

*γ*∼"

_{g}*Beta*" (10,190), which gives average 5% of inclusion probability to reflect prior information that the important signal is sparse in the GWAS.

### Results and Discussion

### ADNI-1 data analysis

*α*to control the average Bayesian FDR [34] and consider any group whose posterior inclusion probability is greater than that of α. In the ADNI data, the threshold is calculated by 0.941 (cFig. 1). In total, 19 SNP-sets were detected as important groups and 106 SNPs were identified by the elementwise-level selection. Fig. 2. shows the estimated coefficient values for 795 SNPs included in the 19 SNP-sets. We colored 106 SNPs selected in the element level in red.

*θ*

_{1,1},

*θ*

_{1,2}, and

*θ*

_{1,3}of the first selected SNP-set for 5,000 iterations of the MCMC algorithm. They show fast convergence of the algorithm, indicating its good mixing properties.

^{-5}suggestive significance level, where none of them had been reported in previous GWASs. Supplementary Table 1 shows the p-values of 106 SNP selected by the element-wise variable selection of the proposed method. None of them were significant at the suggestive significance level. For the kernel-machine method, we considered three types of kernel functions such as the linear kernel, the identical by state (IBS) kernel, and the quadratic kernel. Table 2 shows the SNP-sets selected by the kernel-machine method at the 5 × 10

^{-8}significance level. The selected SNP-sets vary with the type of kernel. The linear kernel, the IBS kernel, and the quadratic kernel selected 5, 8, and 6 SNP-sets, respectively. Two genes, calmodulin-binding transcription activator 1 (

*CAMTA1*) and

*RBFOX1*, were related to Alzheimer’s disease. The linear kernel selected an SNP set located within

*CAMTA1*. Huentelman et al. [45] identified SNPs within the

*CAMTA1*gene that were significantly related to memory performance and memory-related regions on the human brain, which could be considered potential biomarkers of AD.

*RBFOX1*was identified under the quadratic kernel function. Hooli et al. [46] reported that the gene co-segregates with disease status within early-onset familial AD and early or mixed-onset AD families. There were no overlapped SNPs among the three methods.

### Simulation study

*TPR*or sensitivity), true negative rate (

*TNR*or specificity), positive predictive value (

*PPV*), and negative predictive value (

*NPV*). They are defined as follows.

*TP*and

*TN*are the number of correctly identified significant variables and the number of correctly rejected non-significant variables, respectively. The

*FP*and

*FN*denote the number of identified non-significant variables and the number of rejected significant variables, respectively. Under the true model,

*TP*= 10,

*TN*= 1990, and

*FP*=

*FN*= 0, which implies that all the four rates are equal to one.

*TPR*,

*TNR*,

*NPV*, and

*PPV*compared with other group penalty methods by removing the irrelevant groups consistently and selecting important groups very well. As the group penalties with only group-level selection especially grSCAD, grLasso tend to select groups more generously, they select important groups perfectly while the numbers of true positive cases are much bigger than other methods. The bi-level selection penalties, gBridge, and gel show comparative performance to our proposed method. In terms of the element-wise variable selection, our method yields the highest values of all the criteria,

*TPR*,

*TNR*,

*NPV*, and

*PPV*compared with other group penalty methods enabling bi-level selection. As the important signals are sparse, all the bi-level methods perform very well in terms of removing irrelevant signals.

*TPR, TNR, PPV*, and

*NPV*for the group-level selection from the 50 repetition of the simulation are 0.970, 1.000, 1.000, and 1. The corresponding Monte Carlo standard errors are 0.009, 0.000, 0.000, and 0.000. The average values of

*TPR, TNR, PPV*, and

*NPV*for the element-level selection are 0.634, 0.999, 0.589, and 1.000. The corresponding Monte Carlo standard errors are 0.012, 0.000, 0.011, and 0.000. Compared with non-censored cases, the performance of BBVS is satisfactory in the censored case as well.