### Introduction

### Methods

### Inverse variance-weighted average method

*X*can be shown as: where µ is the true effect size and ei (the deviation of

_{i}*X*from µ) is the error in the observation and

_{i}*i*= 1,2,…,

*C*. In order to integrate multiple observed effect sizes X

_{1},…,X

_{C}from multiple studies, the weighted mean approach has been suggested [22],

*W*is not immediately evident, but several attempts were made to identify the optimal weight of the methods based on empirical evidence [17,20,24]. Ideally, one needs to put more weight on the studies with more precision against studies with lower precision [3,25,26]. When the sample size of each study is sufficiently large, we can assume that

_{i}*X*follows a normal distribution approximately, based on the central limit theorem. This applies to situations where the data themselves are not normal (e.g., binary), in which situation the test statistic still follows a normal distribution, as long as the sample size is large. In GWASs, this assumption holds easily, because the sample size is typically as large as thousands of samples. Note that all derivations in this paper are based on this normality assumption. Let

_{i}*SE*(

*X*) be the estimated standard error of

_{i}*X*, and

_{i}*V*=

_{i}*SE*(

*X*)

_{i}^{2}. It is common practice to consider the estimated variance

*V*as the true variance. The inverse variance-weighted average effect size estimator is the weighted mean of

_{i}*X*with the weights [22]:

_{i}*N*(0,1) under the null hypothesis of no effects.

### Results

### Optimality of IVW

* IVW maximizes likelihood function*

*optimal*if the method achieves a specific goal more effectively than any other method. We show that IVW is optimal in two different aspects: (1) the summary estimator gives the greatest likelihood than any other estimator and (2) the summary estimator's variance is smaller than the variance of any other estimator. First, we show that IVW is optimal in the sense that the IVW estimator maximizes the likelihood function. Suppose that we have a series of

*n*studies with observed effect sizes

*X*, i = 1, 2, …

_{i}*n*. Under the fixed effects assumption, there exists a true effect size µ, and each observation

*X*comes from a normal distribution with mean µ and a standard deviation σ

_{i}*. The probability density function of each observation is given by*

_{i}*l*n

*L*(µ,σ

_{i}

^{2}|

*X*

_{1},…,

*X*

_{n}) will be minimized at

*W*= (σ

_{i}

_{i}^{2})

^{−1}[21,22].

* IVW achieves minimum variance*

### Optimality of SZ

* SZ maximizes the non-centrality parameter*

*X*. Rather, we are interested in the statistical significance of the combined information. Thus, the goal of SZ is to maximize how much the z-score will be shifted from 0 on average, which is often called the

_{i}*non-centrality parameter*. By maximizing the non-centrality parameter, we can maximize the statistical power of the test. Among all possible weights that can construct a weighted SZ, we want to find the weights that will maximize the non-centrality parameter.

*X*] = µ, the z-score,

_{i}*Z*, follows a normal distribution Z

_{i}_{SZ}~

*N*(λ,1), where λ is a non-centrality parameter with

*V*is inversely proportional to the sample size

_{i}*N*. However, we would like to note that in some applications, the variance can be a function of not only

_{i}*N*but also other properties of the data. For example, in genetic association studies, when we test an association of a single-nucleotide polymorphism (SNP) to a phenotype, the variance is typically inversely proportional to

_{i}*N*(1−

_{i}p_{i}*p*), where

_{i}*p*denotes the allele frequency of the risk allele. This suggests that if the datasets that we want to combine have different allele frequencies, weighting the z-scores only by

_{i}*N*can be suboptimal. Below, we will show by simulations that we can have some power loss by using just

_{i}*N*as the weight, instead of accounting for frequency differences. However, the approximation of this weight

_{i}### Equality of IVW and SZ under certain assumptions

* Analytical derivation*

*SE*(

*X*)

_{i}^{−1}as weights for z-scores, rather than only using sample sizes.

* Empirical simulation*

*F*≈ 0). Given these assumptions and parameters, we can calculate the expected MAF in cases and in controls. Specifically, given

*MAF p*and relative risk γ, the case MAF becomes γ

*p*/((γ−1)

*p*+ 1), where the control MAF becomes approximately

*p*, given

*F*≈ 0. Given the expected MAF in cases and controls, we could randomly sample genotype data, assuming 500 cases and 500 controls for each of the five studies. To assess the statistical significance of the sample data, we used log odds ratio as a statistic, which follows an asymptotic normal distribution. We repeated the procedure to generate 100,000 simulated meta-analysis sets. Given the significance level α = 0.05, the power was the proportion of sample sets whose meta-analysis p-value was ≤α.

*SE*(

*X*)

_{i}^{−1}as weights.

*w*≠

_{IVW,i}*w*), SZ_N showed a slight power loss from the other two methods (Fig. 1B). This result demonstrates that using only sample size as the weight can be suboptimal if there are other factors that can cause variance differences between studies, such as allele frequencies. Nevertheless, the power drop of using only sample size as the weight was quite small (i.e., at γ = 1.15, the power of IVW and SZ_SE was 58.24%, but the power of SZ_N was 57.23%, with only 1.01% power loss.)

_{SZ_N,i}### Situations in which IVW and SZ can give distinct results

*SE*(

*X*)

_{i}^{−1}(SZ_SE) is analytically equivalent to IVW. However, SZ whose weights are given as the square root of sample size (SZ_N) can give slightly different results, if the expected relationship