### Introduction

### Methods

### Multi-block dataset

*number of variables (k = 1, ···, K). We can express the k-th block X*

_{k}*as X*

_{k}*= [x*

_{k}_{1}, ···, x

_{p}

*]. The total dataset X can be presented as [X*

_{k}_{1}, ···, X

_{k}, ···, X

*].*

_{K}### Generalized canonical correlation analysis

_{1}, ···, x

_{n}) and Y = (y

_{1}, ···, y

_{m}) of random variables and if there are correlations among the variables, the linear combinations of x

_{i}and y

_{j}to maximize the correlation with each other—termed canonical variables— are found through CCA [10].

*variable block, we can denote*

_{k}*a*

*= (*

_{k}*a*

_{k1}

*a*

_{k2}···

*a*

*)′ as the coefficients for each variable in X*

_{kpk}*block. Therefore, the canonical variables, y*

_{k}_{k}(k = 1, ···, K), are expressed as:

*a*

_{1},

*a*

_{2}, ···,

*a*

*that would maximize the weighted summation of the covariance of the two components. The c*

_{K}_{ij}in the equation implies the relationship between variable block X

_{i}and X

_{j}. If they have a relationship, we could assign c

_{ij}= 1; otherwise, we could assign c

_{ij}= 0. The function g() can be various functions, such as horst (g(x) = x), centroid (g(x) = |x|), and factorial (g(x) = x

^{2}). Among these methods, we applied horst methods. A design matrix C = (c

_{jk}) is pre-specified by the user to express the relationships between blocks. The element c

_{jk}is equal to 1 if block j and block k are connected and 0 otherwise [12, 13].

### Data description: KARE

_{1}, is a block of SNP variables; the second variable block, X

_{2}, is a phenotype block that has five phenotype variables related to obesity. The last variable block, X

_{3}, is a disease block that has information on observational status in diabetes and hypertension.

_{1}, has information on 35 SNP variables, and each piece of data was recorded as 0, 1, or 2 according to their genotype. We extracted 35 SNP variables to be included in our analysis according to the specific following steps described in Fig. 2. The original KARE dataset has 311,779 variables, and we regarded 324 SNP variables as our main interest from the literature of Multi-QMDR analysis. The 324 SNP variables showed strong marginal effects in the univariate linear regression models in the paper [15]. From the 324 selected variables, we selected 47 variables that showed a significant relationship with our phenotype variables in the phenotype block. Lastly, we removed extremely similar SNP variables that had a correlation of more than 0.98 with each other in order to clearly see the correlation between variables.

_{2}, is a block of phenotype variables that have been proven to have a relationship with obesity. The five phenotype variables–suprailiac skinfold, subscapular skinfold, body mass index (BMI), waist-hip ratio, and waist–were selected, and all of them are related to obesity. The third variable block, X

_{3}, is a block of diseases. Two disease variables were made from patients’ clinical traits. Participants whose “fasting blood glucose” was higher than 126, “blood glucose/oral glucose tolerance after 120 minutes” was higher than 200, or “who had medication of diabetes” were considered as having diabetes. Participants whose “subscapular skinfold” was over 140, “suprailiac skinfold” was over 90, or “who had medication of hypertension” were considered as having hypertension. Table 1 shows how each group was composed of according to our disease definition. Excluding individuals with missing values among the variables used in this process, the final sample size was 7,389 in the study.

### Results

_{1}=

*a*

_{1}

*SNP*

_{1}+ ··· +

*a*

_{35}

*SNP*

_{35}. The phenotype block’s first canonical variable was V

_{1}=

*b*

_{1}

*SUP*+

*b*

_{2}

*SUB*+

*b*

_{3}

*BMI*+

*b*

_{4}

*WHR*+

*b*

_{5}

*WAIST*. The disease block’s first canonical variable was W

_{1}= c

_{1}

*Diabete*+

*c*

_{2}

*Hyper*. The term a

_{i}, b

_{i}, c

_{i}represents the coefficient of each i-th variable in the block. When we draw the first canonical variables U

_{1}, V

_{1}, W

_{1}and the second variables U

_{2}, V

_{2}, W

_{2}, we can interpret the relationship of each block’s variable to the other block. In this paper, we first explain the relationship between blocks and then discuss the relationship within the blocks.