Web-Based Database and Viewer of East Asian Copy Number Variations

Article information

Genomics Inform. 2012;10(1):65-67
Publication date (electronic) : 2012 March 31
doi : https://doi.org/10.5808/GI.2012.10.1.65
Department of Microbiology, Integrated Research Center for Genome Polymorphism, The Catholic University of Korea School of Medicine, Seoul 137-701, Korea.
Corresponding author: yejun@catholic.ac.kr, Tel +82-2-2258-7343, Fax +82-2-596-8969
†These authors contribute equally to this work.
Received 2012 January 31; Revised 2012 February 16; Accepted 2012 February 18.

Abstract

We have discovered copy number variations (CNVs) in 3,578 Korean individuals with the Affymetrix Genome-Wide SNP array 5.0, and 4,003 copy number variation regions (CNVRs) were defined in a previous study. To explore the details of the variants easily in related studies, we built a database, cataloging the CNVs and related information. This system helps researchers browsing these variants with gene and structure variant annotations. Users can easily find specific regions with search options and verify them from system-integrated genome browsers with annotations.

Introduction

Copy number variation (CNV) is a common type of structural variation in the human genome. They have been suggested to be related to disease susceptibility or human phenotype diversity [1-3]. Current genome-wide association studies of CNVs are attempting to find how they are related to disease susceptibility or phenotypic diversity. Due to the fact that frequencies of CNVs show ethnic differences [4-7], a finding of disease or phenotype association of a specific CNV in one population is hard to generalize to other populations. However, by comparing the frequencies of target CNVs among different ethnic groups, we could assume the population-specific disease susceptibility or phenotype difference.

In this regard, CNV frequency information of various populations has become a major concern of population or disease association studies.

With increasing interest in CNVs, many CNV projects have been announced recently. Many of them have already established large databases of CNVs from many different ethnic groups (Database of Genomic Variants [DGV], http://projects.tcag.ca; The Copy Number Variants Projects, http://www.sanger.ac.uk/research/areas/humangenetics/cnv/) [8]. There are also several databases for supporting population-specific studies, including the CNV Control Database for Japanese (http://gwas.lifesciencedb.jp), the Singapore Human Mutation and Polymorphism Database (shmed.bii.a-star.edu.sg) [9], and the Thailand Mutation and Variation Database (www4a.biotec.or.th) [10]. The accumulation of CNV information seems promising, in that we can gradually extend the knowledge of CNVs everywhere in the world. However, it will take a long time to fill the gap of CNV databases covering the whole population.

As a step to fill this gap, we discovered a Korean CNV with Affymetrix Genome-Wide SNP array 5.0 (Affymetrix, Santa Clara, CA, USA) in a previous study [11]. In this study, we built a Korean database based on the findings of 4,003 copy number variation regions (CNVRs). To build a tool that is easily accessible via the web for Korean-specific CNV studies, we built a viewer for browsing these CNVs.

Features and Results

System requisites

Java Runtime Environment of Sun Microsystems 1.6.0 (Oracle, Redwood City, CA, USA) or equivalent is required, since the system is written in Java language. We used MySQL database for storing and retrieving CNV data and GBrowse2 [12] for drawing regional information. Both are freely available from their distribution websites.

Since the main search pages are written in Java Server Page (JSP) language, Apache Tomcat is needed for the application server and Apache HTTP server is needed for Gbrowse2 viewer pages (Fig. 1).

Fig. 1

System Structures of East Asian Copy Number Variation Database (EACDB). JSP: PHP (Hypertext Preprocessor).

Data

CNVs of our previous study were retrieved from the Affymetrix Genome-Wide Human SNP array 5.0 of 3,578 Korean individuals. A total of 4,003 CNVRs were defined, and 2,077 CNVRs (51.9%) were potentially novel. The annotation data for genes were collected based on the Human Mar. 2006 NCBI36/hg18 build, and reference structure variants were retrieved from DGV (hg18.v8.aut.2009).

Database and viewer

Our web-based database viewer can display previously discovered CNVs by their positions (Fig. 2). Users can also filter out CNVs based on the DGV overlapped regions, CNV type (Gain/Loss/Complex), or their frequencies. Each selected region could be diagnosed in detail by clicking on it. DGV and OMIM ID columns are linked with corresponding websites, and CNVR position columns are linked with the genome browser. The genome browser is integrated based on the open source project GBrowse2. Users can seek or zoom in/out of CNVs across the chromosome by entering positions or clicking zoom buttons. GBrowse2 can also display interesting areas by dragging the region bar without reloading the entire page. Gene information of the selected area is also displayed, and details will be given on separate pop-up page by clicking on it.

Fig. 2

Database search page and genome browser.

Discussion

For a fast-paced research environment, a viewer for searching and observing data in one step is very handy. We hope our system can help researchers who are interested not only in our target polymorphism study but also in a viewer for polymorphisms for general purposes.

We chose a web-based platform for the tool because of its usability and ease of maintenance. Since its workload is very small as an input query for a viewer, a web-based platform does not have drawbacks, like other calculation applications with very large data.

Acknowledgements

This study was supported by a grant of the Korea Health 21 R&D Project, Ministry of Health and Welfare, Republic of Korea (A040002). We thank the KARE consortium for providing the original genotyping data.

Notes

Availability: The East Asian CNV database (EACDB) can be accessed at www.ircgp.com/EACNVDB.html. Some configuration and server installation should be done before the system integration. Contact yejun@catholic.ac.kr for detailed information.

References

1. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007;315:848–853. 17289997.
2. Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet 2009;10:551–564. 19597530.
3. Beckmann JS, Estivill X, Antonarakis SE. Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nat Rev Genet 2007;8:639–646. 17637735.
4. Lam KW, Jeffreys AJ. Processes of copy-number change in human DNA: the dynamics of {alpha}-globin gene deletion. Proc Natl Acad Sci U S A 2006;103:8921–8927. 16709669.
5. Conrad DF, Hurles ME. The population genetics of structural variation. Nat Genet 2007;39(7 Suppl):S30–S36. 17597779.
6. Hegele RA. Copy-number variations add a new layer of complexity in the human genome. CMAJ 2007;176:441–442. 17296953.
7. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 2007;80:91–104. 17160897.
8. Feuk L, Marshall CR, Wintle RF, Scherer SW. Structural variants: changing the landscape of chromosomes and design of disease studies. Hum Mol Genet 2006;15(Spec No 1):R57–R66. 16651370.
9. Tan EC, Loh M, Chuon D, Lim YP. Singapore Human Mutation/Polymorphism Database: a country-specific database for mutations and polymorphisms in inherited disorders and candidate gene association studies. Hum Mutat 2006;27:232–235. 16429432.
10. Ruangrit U, Srikummool M, Assawamakin A, Ngamphiw C, Chuechote S, Thaiprasarnsup V, et al. Thailand mutation and variation database (ThaiMUT). Hum Mutat 2008;29:E68–E75. 18484585.
11. Yim SH, Kim TM, Hu HJ, Kim JH, Kim BJ, Lee JY, et al. Copy number variations in East-Asian population and their evolutionary and functional implications. Hum Mol Genet 2010;19:1001–1008. 20026555.
12. Donlin MJ. Using the generic genome browser (GBrowse). Curr Protoc Bioinformatics 2009. Chapter 9:Unit 9.9.

Article information Continued

Fig. 1

System Structures of East Asian Copy Number Variation Database (EACDB). JSP: PHP (Hypertext Preprocessor).

Fig. 2

Database search page and genome browser.