Hepatitis is a common and serious disease for the Korean population. It is caused by a virus, the A and B types of which are plentiful in Koreans. In this study, we tried to find genetic factors for hepatitis through genome-wide association studies. We took 368 cases and 1,500 controls from Anseong and Ansan cohort data. About 300,000 single-nucleotide polymorphisms and 20 epidemiological variables were analyzed. We did not find any meaningful significant single nucleotide polymorphisms, but we confirmed the influence of major epidemiological variables on hepatitis.
Hepatitis is inflammation of the liver, most commonly caused by a viral infection [
In this study, we performed a genome-wide association study of hepatitis in Korean populations. We tried to find significant single-nucleotide polymorphisms (SNPs) and epidemiological traits related to hepatitis.
The study subjects are based on the Anseong and Ansan cohort data, part of the Korea Association Resource (KARE) projects. The genotypes and phenotypes of the cohort population are described in Cho et al. [
The chosen dataset was imbalanced; the number of cases was smaller than controls. A dataset is imbalanced if it contains many more samples from one class than from the rest of the classes [
To find significant SNPs, we used PLINK, version 1.07 [
Using the traits in
From the epidemiological analysis, we found relevant variables with hepatitis. We confirmed that hepatitis has a wide relation with other diseases. If we make a disease network in which the node is a disease and the edge is a correlation coefficient between two nodes, we can understand the relationship among diseases more clearly. Current known disease networks [
KARE data are the result of a cohort study. It contains a small number of samples for specific diseases, whereas the whole population is very big. It induces an imbalanced dataset for statistical analysis. Our study implies a basic limitation, even though we tried to complement the problem. We also did not find any significant SNPs related with hepatitis. If we combine the knowledge of other biological databases, we may get a more meaningful interpretation for the results of our experiment.
This work was supported by grants from the Korea Centers for Disease Control and Prevention, Republic of Korea (4845-301, 4851-302, 4851-307).
This is 2014 KNIH KARE best paper awarded.
Receiver operating characteristic plot for 4 single nucleotide polymorphisms derived from logistic regression (area under the curve, 0.700).
Receiver operating characteristic plot for 8 epidemiological variables derived from logistic regression (area under the curve, 0.693).
Statistics of hepatitis type C
Year | No. of patients |
---|---|
2008 | 40,683 |
2009 | 42,365 |
2010 | 41,525 |
2011 | 43,879 |
2012 | 45,890 |
Clinical characteristics of variables in this study
Variable | Control | Case | p-value | |
---|---|---|---|---|
No. of population | 1,500 (17) | 368 (4.2) | - | |
AS1 Sex | Male (%) | 931 (62) | 238 (64.7) | 3.91 × 10-9 |
AS1 Age | Age | 51.7 ± 8.9 | 50.2 ± 8 | 0.000405 |
AS1 Height | Height | 160 ± 8.8 | 163 ± 8.9 | 2 × 10-10 |
AS1 BMI | Body mass index | 24.5 ± 3.1 | 24.9 ± 3.2 | 1.02 × 10-5 |
AS1 SBP | SBP (mm Hg) | 121.2 ± 18.7 | 120.2 ± 17.8 | 0.446807 |
AS1 DBP | DBP (mm Hg) | 80 ± 11.4 | 80 ± 11.5 | 0.884027 |
AS1 PdDm | Diagnosis of diabetes | 51 (3.4) | 46 (12.5) | 2 × 10-16 |
AS1 PdUl | Diagnosis of gastritis | 334 (22.2) | 112 (30.4) | 1.68 × 10-6 |
AS1 PdAl | Diagnosis of allergy | 81 (5.4) | 35 (9.5) | 1.77 × 10-5 |
AS1 PdHn | Diagnosis of external head injury | 3 (0.2) | 5 (1.4) | 0.008527 |
AS1 DrugAr | Taking arthritis drug | 50 (3.33) | 17 (4.6) | 0.000566 |
AS1 Albumin | Degree of albumin | 4.3 ± 0.33 | 4.2 ± 0.35 | 2.89 × 10-16 |
Values are presented as number (%) or mean ± SD.
SBP, systolic blood pressure; DBP, diastolic blood pressure.
Top-ranked SNPs of genome-wide association analysis
CHR | RSID | BP | Gene | Minor allele | CHISQ | p-value | OR |
---|---|---|---|---|---|---|---|
11 | rs11025185 | 19550382 | A | 23.67 | 1.45 × 10-9 | 1.15 | |
16 | rs4467099 | 11450395 | A | 19.71 | 5.72 × 10-13 | 1.09 | |
15 | rs1432133 | 24811092 | T | 19.47 | 6.84 × 10-9 | 1.02 | |
12 | rs6582709 | 46104168 | T | 19.39 | 0.00293 | 2.07 | |
5 | rs17568725 | 171103246 | T | 19.38 | 1.27 × 10-8 | 0.90 | |
12 | rs2097726 | 46105143 | T | 19.05 | - | 0.62 | |
6 | rs6569628 | 130137425 | T | 18.3 | 0.99228 | 1.00 | |
14 | rs8014067 | 61623010 | T | 18.09 | 2.52 × 10-9 | 0.59 | |
6 | rs9375664 | 130134371 | T | 17.95 | - | 1.77 | |
6 | rs2326864 | 130136091 | A | 17.73 | 0.32261 | 0.61 | |
8 | rs2607612 | 24662484 | G | 17.72 | 0.68156 | 1.04 | |
12 | rs6582710 | 46104230 | C | 17.5 | - | 0.83 | |
15 | rs2174866 | 51251512 | T | 17.41 | 8.78 × 10-7 | 1.19 | |
6 | rs10484389 | 22183241 | T | 17.13 | 1.55 × 10-8 | 1.18 | |
8 | rs7814301 | 24645945 | C | 16.93 | - | 1.05 | |
8 | rs4368986 | 24641948 | A | 16.89 | 0.57515 | 0.92 | |
13 | rs9522267 | 110994368 | T | 16.85 | 4.48 × 10-10 | 0.92 | |
1 | rs7518687 | 166899607 | A | 16.78 | 1.45 × 10-10 | 1.19 | |
8 | rs6985699 | 24658799 | G | 16.76 | 0.06382 | 1.09 | |
10 | rs4474337 | 10819693 | T | 16.7 | 1.76 × 10-5 | 0.95 |
RSID, reference SNP ID obtained from dbSNP database; BP, base pair based on the human reference genome, ver. 36 (NCBI); CHISQ, chi-square value; OR, odds ratio.
Logistic regression test for SNP data
SNP | Coefficient value | p-value |
---|---|---|
rs4467099 | 0.090529 | 5.72 × 10–13 |
rs9522267 | -0.081189 | 4.48 × 10–10 |
rs7518687 | 0.178053 | 1.45 × 10–10 |
rs1432133 | 0.096962 | 6.84 × 10–9 |
rs8014067 | -0.115841 | 2.52 × 10–9 |
rs11025185 | 0.140190 | 1.45 × 10–9 |
rs10484389 | 0.161836 | 1.55 × 10–8 |
rs17568725 | -0.104275 | 1.27 × 10–8 |
rs2174866 | 0.170567 | 8.78 × 10–7 |
rs4474337 | -0.055057 | 1.76 × 10–5 |
rs6582709 | 0.728803 | 0.00293 |
SNP, single nucleotide polymorphism.
Logistic regression test for epidemiological data
ID | Variable | Coefficient value | p-value | OR |
---|---|---|---|---|
T1 | Sex | -0.1688894 | 3.91 × 10–9*** | 0.84 |
T2 | Age | -0.0041674 | 0.000405*** | 0.99 |
T3 | Diagnosis of diabetes | 0.2717830 | 2 × 10–16*** | 1.31 |
T4 | Diagnosis of gastritis | 0.0960720 | 1.68 × 10–6*** | 1.1 |
T5 | Diagnosis of allergy | 0.1461045 | 1.77 × 10–5*** | 1.16 |
T6 | Diagnosis of external head injury | 0.2739906 | 0.008527** | 1.31 |
T7 | Taking arthritis drug | 0.1577881 | 0.000566*** | 1.17 |
T8 | Degree of albumin | -0.2201075 | 2.89 × 10–16*** | 0.8 |
T9 | Height | 0.0058235 | 0.000304 | 1.0 |
T10 | BMI | 0.0126750 | 1.02 × 10–5*** | 1.01 |
Significant codes: '***', 0.001; '**', 0.01.
OR, odds ratio; BMI, body mass index.
Area under the curve (AUC) values
Variable | AUC |
---|---|
T1 + T2 + T3 + T10 | 0.647 |
T1 + T2 + T3 + T10 + T4 | 0.657 |
T1 + T2 + T3 + T10 + T4 + T5 | 0.662 |
T1 + T2 + T3 + T10 + T4 + T5 + T6 | 0.666 |
T1 + T2 + T3 + T10 + T4 + T5 + T6 + T7 | 0.670 |
T1 + T2 + T3 + T10 + T4 + T5 + T6 + T7 + T8 | 0.690 |
T1 + T2 + T3 + T10 + T4 + T5 + T6 + T7 + T8 + T9 | 0.693 |