首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Highly Scalable Genotype Phasing by Entropy Minimization
【24h】

Highly Scalable Genotype Phasing by Entropy Minimization

机译:通过熵最小化实现高度可扩展的基因型定相

获取原文
获取原文并翻译 | 示例
           

摘要

A Single Nucleotide Polymorphism (SNP) is a positionin the genome at which two or more of the possible fournucleotides occur in a large percentage of the population. SNPsaccount for most of the genetic variability between individuals,and mapping SNPs in the human population has become thenext high-priority in genomics after the completion of the HumanGenome project. In diploid organisms such as humans, thereare two non-identical copies of each autosomal chromosome. Adescription of the SNPs in a chromosome is called a haplotype.At present, it is prohibitively expensive to directly determine thehaplotypes of an individual, but it is possible to obtain rather easilythe conflated SNP information in the so called genotype. Computationalmethods for genotype phasing, i.e., inferring haplotypesfrom genotype data, have received much attention in recent yearsas haplotype information leads to increased statistical power ofdisease association tests. However, many of the existing algorithmshave impractical running time for phasing large genotype datasetssuch as those generated by the international HapMap project.In this paper we propose a highly scalable algorithm based onentropy minimization. Our algorithm is capable of phasing bothunrelated and related genotypes coming from complex pedigrees.Experimental results on both real and simulated datasets showthat our algorithm achieves a phasing accuracy worse but closeto that of best existing methods while being several orders ofmagnitude faster. The open source code implementation of thealgorithm and a web interface are publicly available at http://dna.engr.uconn.edu/~software/ent/.
机译:单核苷酸多态性(SNP)是基因组中的一个位置,在人口中有很大比例的两个或多个可能的四核苷酸存在。 SNP占据了个体之间的大部分遗传变异性,在人类基因组计划完成后,绘制人群中的SNP成为基因组学中的头等大事。在人类等二倍体生物中,每个常染色体有两个不同的副本。在染色体中对SNP的描述被称为单倍型。目前,直接确定一个人的单倍型是非常昂贵的,但是可以很容易地获得所谓的基因型的混合SNP信息。近年来,由于单倍型信息导致疾病关联测试的统计能力提高,因此用于基因型定相的计算方法(即从基因型数据推断单倍型)受到了广泛关注。然而,现有的许多算法在定级大型基因型数据集(如国际HapMap项目生成的基因型数据集)时,运行时间都不切实际。本文提出了一种基于熵最小化的高度可扩展算法。我们的算法能够对复杂谱系中不相关和相关的基因型进行定相。在真实数据集和模拟数据集上的实验结果均表明,我们的算法的定相精度较差,但与最佳现有方法相近,但速度却快了几个数量级。该算法的开放源代码实现和Web界面可从http://dna.engr.uconn.edu/~software/ent/公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号