Recently, new methods have been developed for estimating the current and recent changes in effective population sizes. Based on the methods, the effective population sizes of Korean populations were estimated using data from the Korean Association Resource (KARE) project. The overall changes in the population sizes of the total populations were similar to CHB (Han Chinese in Beijing, China) and JPT (Japanese in Tokyo, Japan) of the HapMap project. There were no differences in past changes in population sizes with a comparison between an urban area and a rural area. Age-dependent current and recent effective population sizes represent the modern history of Korean populations, including the effects of World War II, the Korean War, and urbanization. The oldest age group showed that the population growth of Koreans had already been substantial at least since the end of the 19th century.

Effective population size (N_{e}) is the theoretically useful population size of an ideal population, in which the influence of random genetic drift is identical to that of the actual population [

Recently, new methods have been developed to estimate the current N_{e} and recent changes in N_{e} [

HapMap Phase III data were used for studies to infer the past population histories of various human populations. There were similar data for Korean populations (the Korean HapMap data) [

Based on the formula for testing HWE [_{AA} is the observed genotype frequency of a variant, and _{AA} is the observed allele frequency of a variant. The same as the HapMap data, the KARE data excluded variants that deviated from HWE for a significance level of 0.001. Therefore, similar to the previous study, the mean of the HWE deviations should be adjusted by dividing the mean by the correction term of 0.9873 [

A variant in a chromosome is in LE with a variant in another chromosome; however, the random genetic drift induces deviation from LE. The deviations depend on the effective population size of recent generations, and the recent effective population size can be estimated using the deviations. Assuming a constant population size, the effective population size that reflects recent changes in population size (the recent N_{e}) can be estimated based on Eq. (2), which was derived in a previous study [_{e} estimate using unlinked loci [_{e}, in addition to 1/n, because the haplotype frequencies were estimated using maximum likelihood estimation [

The decay of LD between linked loci is different, depending on the recombination rate. Based on this information, the past population history can be inferred. The LD estimates at a certain recombination rate contain the influences of all Ne values from the current to a certain number of past generations [_{e} estimates, depending on recombination rate, just represent an overall picture of past population history by comparing the actual estimates to the estimates of various past population histories. Similar to the previous study, the effective population sizes were estimated using the linked variants, depending on the recombination rates, between the variants (Eqs. 1 and 3-5). Following the previous assumptions [

Eq. (3) presents the relationship between LD estimates at equilibrium (r^{2}_{eq}), N_{e}, and recombination rate (C). The LD estimates in Eq. (3) indicate the estimates of the original population, excluding the sampling bias. Sampling causes more complicated relationships in the LD estimates of the linked variants than the LD estimates of unlinked variants, due to the maximum likelihood estimation of haplotype frequencies [^{2}_{s}) include the influence of the N_{e} of the original population instead of a simpler factor, 1/(2n_{s}), in addition to the influence of sample size (ns) and the LD estimates of the original population (r^{2}_{o}).

To eliminate sampling bias, the N_{e} of the original population should be known. It is possible to combine and solve Eqs. (3) and (4) for a given ns and r^{2}_{s}. A better way is to use the N_{e} estimate of the current generation (Eq. 1) in the recurrent formula in a previous study [

In Eq. (5), N_{e} indicates the effective population size at equilibrium for recombination rate (C); N_{ec} indicates the effective population size of the current generation from Eq. (1); r^{2}_{o} indicates the LD estimates of the original population; and r^{2}_{eq} indicates the LD estimates at equilibrium from the generation when the LD estimates reached equilibrium to the previous generation right before the current generation -i.e., the parent generation. Therefore, to estimate N_{e} at equilibrium for a certain recombination rate, N_{ec} should be estimated first. The estimated N_{ec} is incorporated into Eq. (4) to derive r^{2}_{o}. In Eq. (5), substituting r^{2}_{eq} with the expression in Eq. (3) provides a cubic equation of N_{e} in which N_{e} can be solvable. The Ne estimates could be examined, depending on various recombination rates, to infer past population histories in which the impact of current Ne was excluded.

The KARE data were generated and kindly provided through a grant program by the Korea Centers for Disease Control and Prevention. The data consisted of a total of 8,842 individuals: 4,637 from Ansan (an urban area) and 4,205 from Anseong (a rural area) in Gyeonggi province. There were 32 groups depending on ages from 39 to 70, and the numbers of individuals in each of the 32 groups were: 21, 421, 469, 489, 485, 473, 426, 400, 368, 291, 325, 268, 215, 207, 241, 242, 259, 197, 225, 238, 231, 255, 239, 230, 232, 254, 270, 210, 254, 200, 194, and 13, respectively. Because there were small numbers of individuals for the ages of 39 and 70, they were combined with the age groups of 40 and 69, respectively. Therefore, the total number of age groups was 30. It is neither necessary nor efficient to use all single-nucleotide polymorphisms (SNPs), because estimating N_{e} does not require many variants. Only 10,000 SNPs or 1,000 pairs of SNPs were enough to estimate the correct N_{e} in the simulations of previous studies [^{2} estimates-i.e., the squared correlation coefficient between two variants.

The current effective population size was 100,778 using the total population. The estimate was much larger than the estimates using the HapMap data. The largest N_{e} in the populations of the HapMap data was 10,437 using CEUp (the parents of Utah, USA residents with ancestry from northern and western Europe). The estimate of the Korean population was approximately 10 times larger than the largest estimates using the HapMap data. From the definition, the effective population size is influenced by many factors, such as mating structure, migration, and admixture [_{e} due to admixture could be excluded. The effects of migration on N_{e} estimates were studied previously [_{e}, based on LD, which is more robust than the joint estimation of N_{e} and migration rate, could be overestimated to be close to the global (metapopulation) N_{e} when the migration rate is high [

After the Korean War, the Republic of Korea (South Korea) experienced severe urbanization and extreme population concentration in a metropolitan area [_{e}. The population density of the Republic of Korea is among the highest among nations, ranking 20th among a total of 265 countries or areas (

^{2} decay plots of chromosome 14, depending on recombination rates, were prominently higher than the plots of other chromosomes [^{2} estimates of chromosome 14 (^{2} estimates in the previous study, was examined, similar allele frequency spectra were seen (

The most similar spectra among the HapMap data were JPT (Japanese in Tokyo, Japan) and CHB (Han Chinese in Beijing, China), and the frequency spectrum of KARE looked like a mixture of the spectra of JPT and CHB. In a previous study [^{2} estimates of chromosome 14 to JPT and CHB might be observed in

The past population history of the total population is shown in _{e} of CHB. The current N_{e} of CHB was 2,926 [_{e} sizes of the Korean and Chinese populations. The populations of both the urban and rural areas showed similar past population histories as that of the total population (

_{e} involves recent changes in the effective population sizes, usually within a few generations. Therefore, the recent effective population sizes were larger than the current effective population sizes in _{e} of individuals born in a certain year merely represents the effective population size of the population born in a certain year. However, the recent N_{e} represents the effective population sizes of several previous generations. The current effective population sizes were mostly between 5,000 and 30,000, and the recent effective population sizes were usually higher than the current N_{e} and increased as the year of birth increased.

The current and recent N_{e} of samples with age 69 or older (born in 1933) was 9,457 and 20,165, respectively. For comparison, the maximum estimate of the current N_{e} of the HapMap data was 10,437. The current N_{e} of Koreans born in 1933 was already large, probably because of the high population density in Korea. The recent N_{e}, which is double the current N_{e}, indicates that the population growth of Korea lasted long before the colonial domination by Japan in 1910. There were two age groups of 58 and 63, for which the current N_{e} estimates were less than 5,000. The corresponding years are 1944 and 1939, respectively. World War II began in 1939 and ended in 1945, and comparatively fewer births would be expected during the period. However, there was a large increase of the current N_{e} in 1943, presenting a nettlesome question. There were several large increases of the current N_{e} in 1943, 1947, 1952, and 1958. The increases in 1947 and 1958 could be explained by the baby boom right after the wars; however, 1943 and 1952 were the 2 years and 1 year, respectively, before the end of the wars, which lasted for 6 and 3 years, respectively. The current and recent N_{e} estimates were compared to the number of the total population and to the rates of population growth recorded in the Korean Statistical Almanac of the Korean Statistical Information Service (_{e} in 1943 if the 1-year inaccuracies in assigning the year of birth are considered. The increments of the current N_{e} in 1943 and 1952 could be due to the decreased crude death rate and increased migration rates [

Using the KARE data, the current and recent effective population sizes of Korea were estimated, based on HWE and LE deviations, and the past changes in population sizes were derived, based on the pattern of LD decay. The results showed clear exponential growth of the Korean population, similar to CHB of the HapMap data. The population growth of China has been positive continuously since 1400, except for the period from 1683 to 1700 [_{e} of the 69-year-old age group (born in 1933) indicates that Korea might have experienced continuous population growth at least in 1873, assuming a generation time of 20 years for the 3 most recent generations. A simulation result of rapid growth (N_{e} changes: 1000, 2000, 4000, 10,000) showed half (5,212) the current N_{e} [_{e}. Therefore, it is likely that the population growth started much earlier than 1873 [_{e}. The migration effects and the discrepancy between the census and the results need more explanations.

The age-dependent N_{e} estimates showed good concordance with the actual modern history of Korea, although several of them require more explanation. Because of the inaccuracies in assigning the year of birth, a sample of the 58-year-old age group might be mixed with a sample of the age groups of 57 and 59. Although it is a rare occasion, individuals having genomic potential to increase current N_{e} in the 57 and 59 age groups might have been assigned to the 58-year-old age group accidently during the sample collection. In any case, the population growth in war time needs more explanation. In _{e} followed the trend of current N_{e} in many cases. Especially, the increased current N_{e} in 1943 contributed to the rapid increase in the recent N_{e} in the late 1950s as the individuals born in 1943 could contribute to reproduction. The recent N_{e} decreased rapidly in the 42-year-old age group (born in 1960), which needs explanation. The baby-boom individuals born in 1958 were 2 years old and could not contribute to giving birth. The recent N_{e} increased again in 1961 and 1962, possibly due to the contributions of individuals who were born in the previous baby-boom period and reached reproductive age.

In the previous studies, it was not necessary to analyze a large number of variants. However, the current study showed that the number of genotypes, as well as the number of individuals, might be important for the quality of the results to infer the past population history. In _{e}, as indicated in previous simulation studies [^{2} estimates in chromosome 14 could also be seen clearly in

The recent N_{e} of the total population was 2,518,501, which was incredibly large. Considering that the current N_{e} of the total population was 100,778, the KARE data showed that the population is under serious deduction of effective population size. The ratio of the current to recent N_{e} was 0.04. This small ratio might be due to migration and extreme population concentration to a metropolitan area. Migration might have had more influence on the recent N_{e} rather than the current N_{e}, because the recent N_{e} reflects the population that moved from all over the country. In addition, the division of the Korean peninsula after the Korean War might have resulted in the extreme ratio. The population moved to the metropolitan area from all over the country, but they could not move back to the northern part of the Korean peninsula. The ratio was not comparable to any of the HapMap data, in which 0.64 of JPT was the lowest, and all populations were concordant with their recent population changes. In _{e} estimates do not represent the actual population size exactly. Further studies for the effects of migrations and confinements with refined samples and data could be helpful.

The estimates of the current effective population size were 91,433 in the rural area and 76,097 in the urban area. The result was surprising, because the urban area is more populated than the rural area. The age distributions of each region differed significantly (_{e} estimates of the urban area, the estimates were usually small, except for several estimates of relatively younger ages. The overall small uneven estimates and the uneven age distribution of the urban area might have resulted in the smaller current N_{e} estimate than the estimate of the rural area. The recent N_{e} of the rural area was 495,871, but the estimate of the recent N_{e} of the urban area was a negative value. As mentioned previously, the negative value might have come from a large N_{e} and sampling failure. Therefore, the negative recent N_{e} estimate of the urban area probably indicates an extremely large recent N_{e}. More studies are necessary to determine the effects of uneven age distributions, rapid population fluctuations, migrations, and population confinements on the N_{e} estimates.

This work was supported by grants from the Korea Centers for Disease Control and Prevention, Republic of Korea (4845-301, 4851-302, 4851-307). This work was supported by National Research Foundation of Korea (NRF) grants, funded by the Korean Government (MSIP) (353-2009-2-C00061 and 2013R1A1A3006685). The key calculations were performed using the supercomputing resource at the Korea Institute of Science and Technology Information (KISTI), which provided support through grant no. KSC-2013-C2-023, and PLSI supercomputing resources.

This is 2014 KNIH KARE best paper awarded.

Supplementary data including three figures can be found with this article at

Linkage disequilibrium decay of 30 age groups (y) from 40 to 69.

N_{e} estimates of 30 age groups (y) from 40 to 69.

(A) Histogram of ages depending on region. (B) Current N_{e} estimates depending on age and region (negative estimates were excluded in the plot).

Linkage disequilibrium decay and Ne estimates of the total population, the population of the urban region, and the population of the rural region, depending on recombination rate.

Region from 65,700 kb to 6,800 kb of chromosome 14. (A) Distribution of minor allele frequencies. (B) Linkage disequilibrium block. CHB, Han Chinese in Beijing, China; JPT, Japanese in Tokyo, Japan; KARE, Korean Association Resource.

Current and recent effective population sizes, depending on age groups, which were converted to years of birth (total population: the census at the end of the year; rate of population growth: the population growth per 1,000 from the census at the end of the year).