遺伝子報告 Korean DNA





Methods Summary

Genomic DNA samples were obtained from an anonymous healthy Korean adult male (AK1) with normal karyotype (Supplementary Fig. 15), using guidelines approved by the Institutional Review Board of Seoul National University (approval C-0806-023-246). AK1 provided written consent for public release of genomic data. A BAC clone library was prepared using standard methods20. End sequences of 96,768 BAC clones were generated with an ABI 3730xl DNA analyser. A minimally overlapping BAC tiling path of chromosome 20, and 1,132 BAC clones from 390 common CNV regions were selected for haploid targeted sequencing. Genome-wide genotyping and CNV detection were performed using Agilent custom 24million feature CGH array set, as well as Illumina Human cnv370- and 610-quad Beadchip. Short and long insert paired-end read libraries were generated from pooled BAC clone DNA or genomic DNA as described3. Paired-end and singleton, 36–106nucleotide reads were generated using Illumina Genome Analyser (GA) and GAII instruments as described3. Long reads were obtained by use of multiple 36 nucleotide sequencing kits with reformulated cleavage reagent provided by Illumina for evaluation. This reformulated reagent has now been made available in all new Illumina sequencing-by-synthesis (SBS) reagent kits. The total sequencing cost took less than 200,000US dollars with total run time of 6weeks using threeGA instruments. Sequences and runs were used in analyses if the average Q scores were≥20. Sequences were aligned to NCBI build 36.3 using GSNAP21. SNPs and indels were identified using optimized filters through Alpheus8. Custom scripts were developed to identify CNVs in short, paired-end reads by the occurrence of clusters of reads with aligned insert sizes deviating from the mean by>2 standard deviations, and based on contiguous regions with significantly increased or depressed coverage. Putative SNPs, indels and deletions were validated by targeted Sanger sequencing (Supplementary Fig. 16 and Supplementary Table 23). Variants that were previously associated with a clinical phenotype or risk in other studies were identified with Trait-o-matic. Statistical analysis was performed using JMP-Genomics (SAS Institute) or R (http://www.R-project.org).
Full methods accompany this paper.

A highly annotated whole-genome sequence of a Korean individual

Jong-Il Kim1,2,4,5,11, Young Seok Ju1,2,11, Hansoo Park1,5, Sheehyun Kim4, Seonwook Lee4, Jae-Hyuk Yi1, Joann Mudge6, Neil A. Miller6, Dongwan Hong1, Callum J. Bell6, Hye-Sun Kim4, In-Soon Chung4, Woo-Chung Lee4, Ji-Sun Lee4, Seung-Hyun Seo5, Ji-Young Yun5, Hyun Nyun Woo4, Heewook Lee4, Dongwhan Suh1,2,3, Seungbok Lee1,2,3, Hyun-Jin Kim1,3, Maryam Yavartanoo1,2, Minhye Kwak1,2, Ying Zheng1,2, Mi Kyeong Lee5, Hyunjun Park1, Jeong Yeon Kim1, Omer Gokcumen7, Ryan E. Mills7, Alexander Wait Zaranek8, Joseph Thakuria8, Xiaodi Wu8, Ryan W. Kim6, Jim J. Huntley9, Shujun Luo9, Gary P. Schroth9, Thomas D. Wu10, HyeRan Kim4, Kap-Seok Yang4, Woong-Yang Park1,2,3, Hyungtae Kim4, George M. Church8, Charles Lee7, Stephen F. Kingsmore6 & Jeong-Sun Seo1,2,3,4,5

Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China1, 2, 3, 4. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8× coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24million probes. Alignment to the NCBI reference, a composite of several ethnic clades5, 6, disclosed nearly 3.45million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.

A bacterial artificial chromosome (BAC) library was constructed from AK1 genomic DNA. The genomic locations of about 100,000 AK1 BAC clones were ascertained by end-sequencing (Supplementary Table 1). Massively parallel DNA sequencing was performed using sequencing-by-synthesis with reversible-terminator chemistry on Illumina Genome Analyzers using two complementary strategies (Table 1, Supplementary Table 2 and Supplementary Fig. 1). First, selected genomic regions were sequenced at very high depth using overlapping BAC clones. Chromosome20 was sequenced in this manner at 155× coverage, as were 390 other regions of the genome that are commonly affected by copy number variants (CNVs) (at an average of 151× coverage). Second, whole-genome sequencing was performed for the entire genome to an average depth of 27.8× using libraries of AK1 genomic DNA with different insert sizes to provide even coverage. Some sequences were generated using a reformulated cleavage reagent that removed thymine fluorophores more completely. This improved phasing and reduced background signals, error rates, and GC bias in longer reads (Supplementary Fig. 2), resulting in increased sequence yields and read lengths to 18gigabases (Gb) per flow cell and 106 nucleotides, respectively. The average sequence quality was 24 (Qscore3), and 74.4% of sequences aligned to the human genome reference (NCBI build 36.3) using the GSNAP alignment tool tolerating 5% mismatches7, 8, 9. A total of 99.8% of the reference genome was represented, and no coverage bias was appreciated apart from expected gaps at centromeres and other heterochromatic regions (Supplementary Discussion).

Bioinformatic filters were trained to detect and genotype SNPs in the aligned sequences. Filters ascertained by comparing SNP genotypes derived from sequencing with results from Illumina 370K genotyping array gave a positive predictive value and sensitivity of SNP detection of 99.9% and 95.0%, respectively (Supplementary Fig. 3 and Supplementary Table 3), and SNP genotype accuracy of 99.1% (Supplementary Fig. 4 and Supplementary Table 4). These filters detected 3,453,653 SNPs in the genome of AK1 (density of 1.21 per kilobase (kb)), of which 17.1% were new and 10,162 were non-synonymous (Supplementary Tables 5 and 6). These results were verified by hybridization of genomic DNA from AK1 to an Illumina 610K genotyping array, deep sequencing of chromosome20 BAC clones, and Sanger resequencing of the AK1 genome (Supplementary Tables 7, 8 and Supplementary Discussion). The number of SNPs detected in the genome of AK1 was similar to that of James Watson, higher than Craig Venter and the Chinese YH, and less than the Yoruba African, NA18507 (Fig. 1a, b and Supplementary Table 9), which may reflect differences in technical procedures or inter-individual variability1, 2, 3, 4. Overlap among 9,527,824SNPs detected in these five sequenced genomes indicated that 21% of AK1’s SNPs were unique, and 8% were shared by all (Fig. 1b). A total of 2,110,403AK1 SNPs were heterozygous, yielding a higher SNP diversity than in the Venter, Watson or YH genomes, but less than the Yoruba individual (heterozygous/homozygous SNP ratio of 1.57, and nucleotide diversity (π) of 7.40×10-4; Supplementary Discussion). Sequencing of other genomes using uniform technical procedures is warranted to evaluate the proportion of genetic variance explained by differences within and between human populations.

We applied the same bioinformatic filters to the genome sequence of AK1 to detect indels (Supplementary Methods). The NCBI reference genome contained 7,910exonic indel mismatches in comparison with the reference transcript (Supplementary Discussion). Excluding these, the SNP filters detected 170,202 indels (density of 0.060 per kb), of which 71,995 were homo- or hemizygous. Sixty-two per cent of indels were new and 55.9% were deletions (Supplementary Tables 10 and 11). The size range detected was -29 to +5 nucleotides, with approximately normal frequency distribution (Supplementary Fig. 5). Two-hundred-and-twelve AK1 indels mapped to coding domains, which was three times greater than that reported for the YH genome and one-fourth of that reported for the Venter genome1, 4 (Supplementary Fig. 6 and Supplementary Table 12). These marked differences reflect substantial differences either between individuals or between technical procedures, highlighting the need for definition of foundational data standards. Indel results were confirmed by Sanger resequencing of AK1 genomic DNA and deep sequencing of chromosome 20 BAC clones, showing that the sensitivity of indel detection was less than 80%, whereas the positive predictive value was 100% (Supplementary Discussion). Indel underestimation was unavoidable in local repetitive or homopolymeric sequences containing indels at, or near, the ends of reads (Supplementary Discussion). Seventy coding-domain indels were homozygous, of which 26 were in genes with Online Mendelian Inheritance in Man (OMIM) entries, 13 of which had medical phenotypes (Supplementary Table 13).
Highly significant pairwise correlations of SNP and indel densities were observed throughout the genome (Pearson’s correlation10 was 0.40 genome-wide, P<10-300; Fig. 1c, Supplementary Fig. 7 and Supplementary Table 14). This SNP–indel correlation seems to be a general phenomenon in individual human genomes, rather than a technical artefact, because it was also detected in the YH genome (Supplementary Table 14) and has been reported for other eukaryotes, including primates11, 12, 13, 14, 15, 16. SNP–indel density covariation was not a function of coverage depth or gene density (Supplementary Table 14). Genome-wide correlation of SNP and indel densities in individual human genomes is a new finding, and suggests that unifying molecular or temporal considerations underpin the generation and/or removal of both types of variants11, 12, 13, 14, 15, 16.
Several, complementary approaches were used for CNV detection in AK1 (Supplementary Fig. 8). For large deletions of the AK1 genome, we used events identified in deep sequencing of 1,132 BAC clones as a training set (Supplementary Fig. 9). Most showed reduced coverage, predominance of homozygous SNPs (for heterozygous deletions), and alignment of ‘stretched’ paired-end reads in whole-genome sequences (Supplementary Fig. 10). We used these criteria to filter candidate deletions detected by a new, custom-designed 24million probe set array comparative genomic hybridization (CGH), which found 1,237 CNV regions in total (Supplementary Fig. 11 and Supplementary Table 15), as well as genotyping microarrays (Supplementary Table 16). Figure 2a shows an example of a deletion in the genome of AK1 detected both by BAC sequencing and by CGH with the characteristics mentioned above. Figure 2b shows another deletion where the coverage drop in whole-genome sequencing was not as prominent as in BAC sequencing. However, the latter region contains all other features of deletions, illustrating the usefulness of BAC sequencing for CNV detection. The 238 regions that met these conservative criteria represent the most reliable list of true deletions yet identified in an individual genome sequence (Supplementary Fig. 10 and Supplementary Table 17). Deletions in AK1 ranged from 277 to 196,900bases in length, and totalled 2.4Mb. One-hundred-and-forty-eight of these had not previously been described in the Database of Genomic Variants as of 10 November 2008 (DGV; http://projects.tcag.ca/variation/).

Copy number gains in AK1 were selected conservatively with three approaches for different sizes of insertions: (1) array CGH described above yielded insertions ranging from 2.15kb to 1.06Mb, (2) aligned end-sequences of BACs yielded insertions of 16.8to 357.1kb, and (3) aligned long-insert paired-end reads yielded insertions of 0.9 to 2.2kb (Supplementary Tables 18–20). These regions were confirmed by increased sequencing coverage of these genomic regions (Supplementary Figs 12–14). An example is shown in Fig. 2c, in which an increased signal on a microarray coincided with a corresponding significant gain of sequencing coverage. The AK1 genome contained 77 copy number gains, totalling 7.0Mb. Thirty-three (42.8%) of these were absent from the DGV and were therefore considered to be new.
Non-synonymous SNPs detected in AK1 were compared with those identified in the YH and the Yoruban genomes3, 4 (which were ascertained using technical approaches similar to those used here) (Fig. 3a). Although only 37% of AK1 SNPs were shared among these three genomes, 57% of genes that contained non-synonymous SNPs in AK1 were common to all three (Fig. 3a and Supplementary Discussion). These data indicate that a subset of genes is enriched for non-synonymous SNPs in these individuals. Ontology analysis of this gene subset showed enrichment for functions associated with environmental adaptation, such as sensory function, immunological function, and signal transduction (Supplementary Table 21). Possibly, these genes have heightened diversity and/or many pseudogenes.


Using Trait-o-matic—an algorithm for high-throughput variant annotation—773SNPs that were potentially associated with clinical phenotypes were identified (J. V. Thakuria and G. M. Church, manuscript in preparation; Supplementary Table 22). Of these, 269 were relatively common SNPs previously associated with risk of complex disorders or traits. For example, the genome of AK1 contained 90SNPs that have shown associations with susceptibility to various cancers, 34SNPs with type II diabetes mellitus, 13 with Alzheimer’s disease, and seven with rheumatoid arthritis. These data should be interpreted cautiously, however, because risk factors for complex diseases, for example, rheumatoid arthritis, differ in northwest European and Korean populations17, and because the translation of genetic burden into risk assessment for polygenic traits is rudimentary. The genome of AK1 also contained 504non-synonymous SNPs in genes associated with complex or Mendelian disorders or traits. Of these, 22 were stop codons and five were homozygous. Among Mendelian traits, AK1 was homozygous for a variant conferring dry earwax18 that has a high allele frequency in Koreans. Eighteen variants of pharmacogenetic relevance were identified, potentially affecting dosing, efficacy and/or toxicity of β-2-adrenoceptor agonists, statins, rosiglitazone, warfarin, citalopram, abacavir, debrisoquine, bleomycin, fluorouracil and aramycin-C (Fig. 3b).
CNVs have shown associations with common, complex disorders in humans. One-hundred-and-six genes were affected by CNV losses in AK1 (Fig. 3c). One gene deleted in the genome of AK1 was leukocyte immunoglobulin-like receptor (LILRA3). Most northeast Asians have functional loss or deletion of this locus, which has been suggested to be under positive or balancing selection19.
We have obtained the genome sequence of a Korean individual by a unique combination of whole-genome shotgun sequencing, targeted BAC sequencing, and custom-designed high-resolution array CGH. This combination of approaches improved the accuracy of SNP, indel and CNV detection, and will assist in the assembly of contiguous sequences. Agreement on technical standards for individual genome sequences will aid in comparisons between genomes and, ultimately, to associations with phenotypic differences.




人類遺伝学のジャーナル(2006 年) 51, 137-140。doi:10.1007/s10038-005-0338-5

韓国縁取り空胞型ミオパチー患者のGNE遺伝子の変異解析

ビョングク ヨンジュン キム1張錫基2キム ・ ジョンウォン2徳賢成3チェ ・ ヨンチョル4と承キム ・ ヨンヒョン ×5
  1. 1サムスン医療センター、神経、成均館大学大学院医学、ソウル、韓国
  2. 2研究室サムスン医療センター、成均館大学大学院医学、ソウル、韓国
  3. 3物理医学と成均館大学大学院医学、ソウル、韓国サムスン医療センター リハビリテーション
  4. 4神経内科、脳韓国 21 プロジェクトの医学、延世大学医学部, ソウル, 韓国
  5. 5神経内科医科大学、漢陽大学、ソウル、韓国
対応: 張錫 Ki、医学部検査部、サムスン医療センター、成均館大学大学院医学、ソウル, 南朝鮮ファックス: +82-2-34102719。電子メール: changski@skku.edu;・ ヨンチョル チェ ・神経内科脳韓国 21 プロジェクトの医学、延世大学医科大学ソウル、韓国です。電子メール: ycchoi@yumc.yonsei.ac.kr
受信 2005 年 9 月 8 日;2005 年 10 月 21 日; を受け入れオンライン 2005 年 12 月 22 日を掲載

抽象的な

縁取り空胞 (DMRV; 型ミオパチーMIM 605820) は常染色体優性劣性の神経筋疾患、下肢の前方コンパートメントの弱さによって特徴付けられる大腿四頭筋を温存します。最近では、UDP-N- アセチルグルコサミン-2-エピメラーゼの変異/N- acetylmannosamine キナーゼ (GNE) 遺伝子が DMRV の遺伝の基礎として識別されています。DMRV 韓国患者におけるGNE遺伝子変異のスペクトルを調査するには、DMRV を持っているが疑われる 9 つの無関係な患者の臨床遺伝学的解析を行った。シーケンス解析を明らかにした 9 つのうち 8 例 (88.9%) がホモまたは化合物を直接 3 知られている (C13S、R129Q、および V572L) と 2 つの新規変異 (M29T と A591P) を含む、 GNE遺伝子変異のヘテロ接合体。V572L および C13S の突然変異の対立遺伝子の頻度が 68.8% (11/16) と 12.5% (2/16)、それぞれ。DMRV 疑い患者におけるGNE遺伝子変異のスクリーニングが韓国の人口 DMRV の分子診断の参考になることが示唆されました。

キーワード:

縁取り空胞、DMRV、 GNE遺伝子、変異型ミオパチー

韓国人のルーツは「悪魔の洞窟で暮らした新石器人」=研究発表にネットから疑問噴出=「標本2つで出した結論」「おしりの蒙古斑はなに?」

2017年2月5日 00時50分 (2017年2月7日 00時00分 更新)

2017年2月2日、韓国・中央日報によると、蔚山(ウルサン)科学技術院(UNIST)ゲノム研究所は国際学術誌「Science Advances」で、韓民族(=朝鮮民族)は、3~4万年前に東南アジアから中国東部の海岸を経て極東地方に流れ込み、北方人となった南方系の狩猟採集民と、新石器時代が始まった1万年前に同じ経路で入ってきた南方系の農耕民族の血が混ざって形成されたと発表した。

これまで人類・考古学界の一部では、言語・風習・容姿などの共通点が多いことから、朝鮮民族アルタイ山脈に始まり、モンゴルと満州の原野を越えて朝鮮半島に入ってきた北方民族であると推定されてきた。

しかし科学界の判断は違っており、2009年、UNISTは国際学術誌「Science」に「朝鮮民族が東南アジアから北東へ移動した南方系の巨大な流れに属している」と発表、今回の発表はこれをさらに具体化したことになる。

その手がかりは、ロシア・ウラジオストクの上方にある沿海地方の「悪魔の門(Devil’s Gate)」という名の洞窟で発見された7700年前の20代と40代の女性の頭蓋骨にあった。ここは韓国の歴史上、かつての高句麗・東夫余(ふよ)・沃沮(よくそ)の地と言われている。ゲノム研究所がスーパーコンピュータを利用してこの頭がい骨のゲノムを解読・分析したところ、悪魔の門の洞窟人は3~4万年前に現地に定着した南方系人で、韓国人のように茶色い目とシャベル型切歯(shovel-shaped incisor)の遺伝子を持っていたことが明らかになった。また彼らは、牛乳を消化できない遺伝変異や、高血圧に弱い遺伝子、体臭が少ない遺伝子、耳たぶの薄い遺伝子など、現代の東アジア人の典型的な遺伝特性も持っていたという。悪魔の門の洞窟人は近くに住む「ウルチ(Ulchi)」族の先祖とされており、近くの原住民を除く現代人の中では韓国人がこれと近いことが判明した。

UNISTゲノム研究所のパク・ジョンファ所長は「ミトコンドリアDNAの種類が同じであるということは、母系が同じであるということを意味する。長い時間差を考慮しても2つの人類の遺伝子は非常に近く、悪魔の洞窟人は韓国人の祖先とほぼ同じだと言える」と話している。

これを受け韓国ネットユーザーからは、研究方法に関連して「偶然洞窟にたどり着いたのかもしれないし、サンプル2つは少なすぎ」「標本2つの結論か。もっと多い標本が必要な研究じゃない?」というコメントや、「でも、言語は北方系のモンゴル語じゃん。これはなんで?」「おしりにある蒙古斑はなに?」というコメント、「つまり、ウィー・アー・ザ・ワールドってことね。ということは、今後科学がもっと発達したら数十万年前まで研究できるから、また北方系になる可能性もあるわけだ」「朝鮮民族単一民族ってよく言うけど、これはギャグ」など、異論や疑問を唱えるコメントが多く寄せられている。(翻訳・編集/松村)