Many packages for a meta-analysis of genome-wide association studies (GWAS) have been developed to discover genetic variants. Although variations across studies must be considered, there are not many currently-accessible packages that estimate between-study heterogeneity. Thus, we propose a python based application called Beta-Meta which can easily process a meta-analysis by automatically selecting between a fixed effects and a random effects model based on heterogeneity. Beta-Meta implements flexible input data manipulation to allow multiple meta-analyses of different genotype-phenotype associations in a single process. It provides a step-by-step meta-analysis of GWAS for each association in the following order: heterogeneity test, two different calculations of an effect size and a p-value based on heterogeneity, and the Benjamini-Hochberg p-value adjustment. These methods enable users to validate the results of individual studies with greater statistical power and better estimation precision. We elaborate on these and illustrate them with examples from several studies of infertility-related disorders.

Genome-wide association studies (GWAS) of diseases and traits have increasingly been used to identify single nucleotide polymorphisms (SNPs). Although GWAS have tested hundreds of thousands of genetic variants to discover genotype-phenotype associations, they have a few limitations. Variants discovered in individual GWAS explain only a small proportion of heritability, and their genetic effect sizes are mostly small and require a substantial sample size to identify [

As meta-analysis has become a popular tool for aggregating data from multiple sources, several studies have revised analytical strategies from previous well-known studies [

Since it is crucial to increase statistical power in order to identify significant variants, especially in studies with small sample sizes, we demonstrate Beta-Meta using studies of diseases related to infertility, most of which have relatively small sample sizes [

Meta-analysis can improve signal detection when we account for not only between-study heterogeneity but also differences in linkage disequilibrium (LD) between ethnicities [

After surveying the studies of interest (infertility-related disorders in this paper), we created a table for input data in Excel (

When the OR and its confidence interval are used for input data, they are converted into the beta coefficient and the standard error, respectively. The normalized effect of the ^{th}_{i}

The standard error _{i}

When synthesizing datasets for meta-analysis, it is important to ensure uniformity in allele labels and hence in the direction of the effect because alleles are typically called on only one of the two DNA strands in sequencing experiments [

In meta-analysis, datasets generated by multiple groups by different methods are likely to have any kind of variability, also known as heterogeneity. Heterogeneity indicates that the observed effects in datasets are more different from each other than would be expected by random error alone [

Then, we calculate the Cochran’s Q statistic, Q and Higgins’ heterogeneity metric, ^{2}

^{2}^{2}^{2}^{2}

For 0≤^{2}

For 50≤I2≤100, we use the random effects model [

The weights for the random effect model _{i}^{R}

The integrated p-value through meta-analysis can be obtained as follows [

where Φ is the cumulative distribution function of the standard normal distribution, and integrated

Finally, to reduce the false-positive results, the integrated p-values are corrected by the BH adjustment method. When _{(1)}, _{(2)}, ⋯,_{(m)} are the p-values of the SNPs sorted in ascending order (_{(1)} ≤ _{(2)} ≤ ⋯ ≤ _{(m)}), the adjusted p-values obtained through the BH procedure are as follows [

where

Using Beta-Meta, we performed a sample test of integrating multiple studies of infertility and obtained a table containing all of the above calculated summary statistics values (^{-8} was used to identify significant SNP markers. Of the total 26 SNP-phenotype associations from the 23 studies we investigated (^{-12} (

In order to check the accuracy of Beta-Meta, we compared the meta-analysis results of Beta-Meta (

Beta-Meta application can be utilized as an effortless meta-analysis tool for researchers with limited statistics backgrounds. It allows them to easily manipulate and analyze their own datasets on a personal computer as it is written in python and can be run with an executable file in MS Windows.

As shown above, Beta-Meta increases the power to detect weak signals, identifying significant variants which was not significantly associated in single studies. Furthermore, it calculates the effect sizes and the p-values accurately by selecting the appropriate model based on heterogeneity and applying the BH adjustment. These can contribute to time-efficient management of the recent growth in aggregated GWAS especially for those involved in the field of genetic testing. Because it is difficult to obtain a large number of datasets and validate genotype-phenotype associations experimentally within a limited budget, meta-analysis is still in demand to discover SNP markers for genetic testing.

In conclusion, the application presented here provides a conventional and yet convenient way to conduct a meta-analysis of GWAS. Beta-Meta is expected to facilitate various research projects, such as the discovery of novel SNP markers, the calculation of polygenic risk scores, and the acquisition of biological insights into complex diseases and traits.

Conceptualization: WL. Data curation: GK. Formal analysis: GK. Methodology: WL, GK. Software: GK. Supervision: DK. Visualization: JHP, GK. Writing – original draft: YL, GK, WL. Writing – review & editing: DK.

No potential conflict of interest relevant to this article was reported.

Beta-Meta is written in python 3.9.7, and is available at

This work was supported by National IT Industry Promotion Agency (NIPA) grant funded by the Korea government (MSIT) (No. S0252-21-1001, Development of AI Precision Medical Solution (Doctor Answer 2.0)).

Supplementary data can be found with this article online at

Input data: summary statistics of the individual GWAS of infertility

Output data: summary statistics after meta-analysis

Output data: summary statistics after meta-analysis using METAL

Forest plot of the combined effect sizes. Forest plot of 95% confidence intervals of the combined effect sizes after meta-analysis.

Overview of Beta-Meta pipeline.

Example of input data: summary statistics of the individual GWAS of infertility

Phenotype | SNP | EA | NEA | OR (95% CI) | p-value | PMID |
---|---|---|---|---|---|---|

Endometriosis | rs10965235 | C | A | 1.489 | 1.30E-4 | 25154675 |

(1.213–1.827) | ||||||

Endometriosis | rs10965235 | C | A | 1.44 | 5.57E-12 | 20601957 |

(1.3–1.59) | ||||||

Polycystic ovary syndrome | rs13405728 | A | G | 1.55 | 1.00E-03 | 34403018 |

(1.39–1.72) | ||||||

Polycystic ovary syndrome | rs13405728 | G | A | 0.723 | 1.00E-03 | 30182769 |

(0.686–0.762) | ||||||

Folic acid metabolism-related male infertility | rs1801133 | T | C | 1.33 | 1.40E-02 | 16247718 |

(1.06–1.66) | ||||||

Folic acid metabolism-related male infertility | rs1801133 | C | T | 0.7 | 1.00E-05 | 30813130 |

(0.66–0.75) | ||||||

Non-obstructive azoospermia | rs10842262 | G | C | 1.335 | 2.30E-03 | 24648396 |

(1.1081–1.6083) | ||||||

Non-obstructive azoospermia | rs10842262 | G | C | 1.23 | 0.001 | 30863997 |

(1.16–1.3) |

GWAS, genome-wide association studies; SNP, single nucleotide polymorphism; OR, EA, effect allele; NA, non-effect allele; odds ratio; CI, confidence interval.

Example of output data presenting only the significantly associated SNPs after meta-analysis

Phenotype | SNP | EA | NEA | se( |
p-value | adjusted p-value | I^{2} |
Q | |
---|---|---|---|---|---|---|---|---|---|

Endometriosis | rs10965235 | C | A | 0.371 | 0.046 | 8.2E-16 | 8.2E-16 | 0 | 0.083 |

Polycystic ovary syndrome | rs13405728^{a} |
A | G | 0.371 | 0.056 | 3.55E-11 | 7.11E-11 | 71.70 | 3.534 |

Folic acid metabolism-related male infertility | rs1801133 | C | T | -0.351 | 0.031 | 4E-29 | 8E-29 | 0 | 0.361 |

Non-obstructive azoospermia | rs10842262 | G | C | 0.214 | 0.028 | 1.36E-14 | 1.36E-14 | 0 | 0.679 |

SNP, single nucleotide polymorphism; EA, effect allele; NEA, non-effect allele.

For rs13405728, the integrated effect size and p-value were calculated under the random effects model as its I^{2} was greater than 50.