### Introduction

*n*times, and it is commonly used when the standard distributional assumptions are violated. Although this random permutation is an attractive method in genome-wide association studies (GWAS), there is little guidance to achieve calculation efficiency for adjusting p-values via N-permutations using computing architecture.

### Methods

*T*is determined as

*P*indicates the original p-value of Fisher’s exact test for the k-th locus. In general, the expected scale of

_{k}*G*(i.e., number of SNPs) would be ~10

^{4}–10

^{n}. When the total number of permutations is N, MPI-GWAS shuffled the matrix

*G*and

*T*to generate background distributions of genetic diversity across

*N/b*MPI ranks (where

*b*= subtasks of shuffling numbers per MPI rank). Using these permutated matrices, each subtask per MPI rank calculates

*f*while looping the number of subtasks, where function

*f*is defined as 1 if p'k < pk, otherwise 0; then, the master MPI rank yields the adjusted p-value of the k-th locus (

*p"*) by reducing the result value of other MPI tasks with the sum function and dividing by

_{k}*N*(Fig. 1A, red pseudo-code). Specifically,

*p'*<

_{k}*p*indicates that the number of observations where random p-values of the k-th locus is less (i.e., more significant) than the calculated p-value using real observations. Therefore, the adjusted p-value means a probability of type 1 error occurrence for the k-th locus under N-time permutated distributions.

_{k}### Results

^{7}permutations of one locus can be calculated in 600 s using 2,720 CPU cores, which is 7.2 times faster than 272 cores. The weak scale (Fig. 1C) indicates that even if the number of permutations per one locus is increased according to the number of computation nodes, it performs well. Two cohorts of actual data were used to verify the performance of MPI-GWAS: the Korean Genome and Epidemiology Study (KoGES) [4] and the UK biobank (UKBB) [5]. The repeated observation of longitudinal traits, such as alteration of blood pressures along traced assessments for decades, is a representative example of the violation of the normal distribution of phenotypes. Thus, we utilized the traced phenotype of type 2 diabetes mellitus (T2DM) in the KoGES and UKBB, respectively. The phenotype of T2DM was measured repeatedly seven times every 2 years in the KoGES. Likewise, the participants of the UKBB have been assessed for the phenotype of T2DM up to three times across 10 years. The adjusted p-values via 10

^{7}permutations using the KoGES and UKBB are displayed in Fig. 1D. In the case of the KoGES, covering 31,437 loci per assessment, a total computing time with 171,360 CPU cores was ~4 days using 2,500 nodes (25% of Nurion). With the UKBB data, covering 52,858 loci per assessment, the total elapsed time was similar. The selection of SNPs for KoGES is based on the traced loci using genotype array. To achieve a similar scale of validation, we used a subset of loci from the UKBB data. For the selection of 52,858 loci from the UKBB, we utilized the linkage disequilibrium pruning process via the PLINK. As depicted in Fig. 1D, type 1 errors of p-values were adjusted via large-scale N-permutations. In conclusion, MPI-GWAS enables us to feasibly compute the permutation-based GWAS within a reasonable time and to harness the power of supercomputing resources.