### Introduction

### Methods

### Genetic variants as instrumental variables

### Basic model of MR and the first-order approXimation of variance

*β*

_{X}≠0 because of IV1 and

*β*

_{Y}=

*β*

_{X}×

*β*because of IV2 and IV3. That is, G (Fig. 1) affects Y (outcome) only through X (exposure). It is assumed that the error terms ε

_{X}and ε

_{Y}follow normal distributions and are independent in the case of 2SMR of two disjoint samples. Even in the case of two non-overlapping samples, a report has stated the sample correlation between

*β*≠0, it is essential to obtain the variance estimate of

### The second-order approXimation method of variance of estimated causal effects

### Simulation design

*β*and

*β*

_{X}, which also gave us the true value of

*β*

_{Y}=

*β*×

*β*

_{X}. We assumed the intercepts

*β*

_{X0}=0.03 and

*β*

_{Y0}=0.03, and the errors

_{X}and G

_{Y}, which are composed of 0, 1, and 2 from the distribution Binomial(2, MAF), where MAF denotes the minor allele frequency. We generated (X|G

_{X}, Y|G

_{Y}) by adding noise with mean 0 and variance (Var(ε

_{X}), Var(ε

_{Y})) to (

*β*

_{X0}+

*β*

_{X}G

_{X},

*β*

_{Y0}+

*β*

_{Y}G

_{Y}). Then we obtained

_{x}is the size of the reference dataset used in 2SMR and N

_{y}is the size of the target sample.

### Results

_{x}/N

_{y}), we also varied

*β*(the magnitude of causal effect) and MAF. Fig. 2 shows that the analytical approXimation that contained variance up to the second-order term was almost as accurate as the empirical estimate, whereas the first-order approXimation method was often largely inaccurate depending on the situation.

_{y}) decreased from 200,000 to 2,000 (as the N-ratio increased from 1 to 100). The ratio was 0.84 when N

_{y}was 100,000, which is equal to N

_{x}/2 (N-ratio=2). The ratio rose to 0.99 when N

_{y}was 2,000 (N-ratio = 100). The mean of the ratios was 0.98, which translates to a reduced SE(

*β*) between the exposure and outcome increased from 0.01 to 1. Therefore, if there is not a strong causal effect between the exposure and outcome in MR, the error from the first-order approXimation would be small. The mean of the ratios of the first-order approXimation was 0.93. Fig. 2C shows that, interestingly, the ratio appeared to be independent of the MAF of the variant. The mean of the ratios in this simulation was 0.93 in the first-order case.

_{y}= 100,000 and N

_{x}= 200,000), the FPR of the first-order approXimation method increased to 0.071 (the dark red colored large dot in Fig. 3A), while the FPR of the second-order approXimation method was 0.049 (the dark blue colored large dot in Fig. 3A), corresponding to approXimately 0.7 times that of the first-order case. The average FPR in the second-order approXimation method was 0.049, whereas the average FPR in the first-order approXimation was 0.052. These findings indicate that the second-order approXimation can be a good choice to prevent inflation of the FPR.

*β*= 0.6, which denotes the causal effect of the exposure on the outcome. Under this setting, the power of the first-order approXimation was similar to that of the second-order approXimation (on average 1.01 times greater).

### Discussion

*β*-ratio. When the number of samples in the target study increased while the number of samples in the external dataset for exposure association was fixed, the errors became larger. This suggested that in future studies, a larger study size may correspond to increased error from the first-order approXimation method. Furthermore, as the true causal effect increased, so did errors. Interestingly, the errors appeared to be independent of the MAF.

*β*) via the inverse-variance weighted method over a large number of variants. In this extended multi-variant model, we expect that the variance of the final estimate will also be affected by the errors induced by the first-order approXimation, because the ratio for all variants is affected regardless of MAF. Then, the standard error of the causal effect,

*β*is very small), we observed no significant difference between the first- and second-order approXimations. Therefore, we expect that whether one must apply the second-order approXimation to avoid an increased FPR will depend on many factors, including the actual range of

*β*.