基因組所完成“人類多層次多人群自然選擇數據庫”構建
為了更好地了解人類不同種群的遺傳差異和受到的自然選擇情況,以及比較不同指標之間的印證關系。近日,中國科學院北京基因組研究所曾長青研究員實驗室的程鋒等人,通過選擇目前SNP(單核苷酸多態性)分型數量最大和種群數最多的HapMap(國際人類基因組單體型圖計劃)分型數據作為研究基礎,從基因組大片段,功能基因以及單個SNP位點等三個層次來研究人類不同種群的基因組遺傳分化和所受自然選擇的情況。根據使用多個不同指標(HET,Win_HET, FST, Win_FST, iHS, ES_HET, ES_FST, P_iHS等)及策略來掃描選擇信號,并把它們置于同一個框架下進行比較和驗證,以求獲得最大的信息。研究結果建立了“人類多層次多人群自然選擇數據庫”暨陽性自然選擇數據庫SNP@Evolution (http://bighapmap.big.ac.cn/)供國內外科研使用,自九月下旬相關文章在BMC Evol Biol發表以來,SNP@Evolution已受到來自全世界幾十個國家和地區,上萬次的訪問和下載,為該領域的研究人員提供了一個發現選擇信號的有用工具。
SNP@Evolution共分為數據查詢和圖形查詢界面兩個部分。包括了HapMap II期和III期的數據結果。II期共有3,619,226個SNP數據,以及21,859個基因的分析數據。共有1606個基因組大片段顯示選擇信號,660個顯示分化信號。III期數據共包含1,389,498 SNPs, 21,099個有效基因分析數據。在11個人群中找到了10,138個受選擇的基因組片段,以及464個具有強分化的基因組片段。為了方便研究,SNP@Evolution的查詢結果可以鏈接到其他數據庫獲取更多信息。
數據庫鏈接:
文獻記錄:
Cheng Feng, Chen Wei, Richards Elliott, Deng Libin, Zeng Changqing. SNP@Evolution: a hierarchical database of positive selection on the human genome. BMC Evolutionary Biology 2009, 9:221.
原文鏈接:
http://www.biomedcentral.com/1471-2148/9/221
原文摘要:
Abstract
Background: Positive selection is a driving force that has shaped the modern human. Recent developments in high throughput technologies and corresponding statistics tools have made it possible to conduct whole genome surveys at a population scale, and a variety of measurements, such as heterozygosity (HET), FST, and Tajima's D, have been applied to multiple datasets to identify signals of positive selection. However, great effort has been required to combine various types of data from individual sources, and incompatibility among datasets has been a common problem. SNP@Evolution, a new database which integrates multiple datasets, will greatly assist future work in this area.
Description: As part of our research scanning for evolutionary signals in HapMap Phase II and Phase III datasets, we built SNP@Evolution as a multi-aspect database focused on positive selection. Among its many features, SNP@Evolution provides computed FST and HET of all HapMap SNPs, 5+ HapMap SNPs per qualified gene, and all autosome regions detected from whole genome window scanning. In an attempt to capture multiple selection signals across the genome, selection-signal enrichment strength (ES) values of HET, FST, and P-values of iHS of most annotated genes have been calculated and integrated within one frame for users to search for outliers. Genes with significant ES or P-values (with thresholds of 0.95 and 0.05, respectively) have been highlighted in color. Low diversity chromosome regions have been detected by sliding a 100 kb window in a 10 kb step. To allow this information to be easily disseminated, a graphical user interface (GBrowser) was constructed with the Generic Model Organism Database toolkit.
Conclusion: Available at http://bighapmap.big.ac.cn, SNP@Evolution is a hierarchical database focused on positive selection of the human genome. Based on HapMap Phase II and III data, SNP@Evolution includes 3,619,226/1,389,498 SNPs with their computed HET and FST, as well as qualified genes of 21,859/21,099 with ES values of HET and FST. In at least one HapMap population group, window scanning for selection signals has resulted in 1,606/10,138 large low HET regions. Among Phase II and III geographical groups, 660 and 464 regions show strong differentiation.