...
首页> 外文期刊>Comparative and functional genomics >Sequence Search Algorithms for Single Pass SequenceIdentification: Does One Size Fit All?
【24h】

Sequence Search Algorithms for Single Pass SequenceIdentification: Does One Size Fit All?

机译:用于单遍序列识别的序列搜索算法:一种尺寸适合所有人吗?

获取原文
           

摘要

Bioinformatic tools have become essential to biologists in their quest to understand the vastquantities of sequence data, and now whole genomes, which are being produced at an everincreasing rate. Much of these sequence data are single-pass sequences, such as samplesequences from organisms closely related to other organisms of interest which have alreadybeen sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequencesoften contain errors, including frameshifts, which complicate the identification ofhomologues, especially at the protein level. Therefore, sequence searches with this type ofdata are often performed at the nucleotide level. The most commonly used sequence searchalgorithms for the identification of homologues are Washington University’s and theNational Center for Biotechnology Information’s (NCBI) versions of the BLAST suites oftools, which are to be found on websites all over the world. The work reported hereexamines the use of these tools for comparing sample sequence datasets to a knowngenome. It shows that care must be taken when choosing the parameters to use with theBLAST algorithms. NCBI’s version of gapped BLASTn gives much shorter, andsometimes different, top alignments to those found using Washington University’s versionof BLASTn (which also allows for gaps), when both are used with their default parameters.Most of the differences in performance were found to be due to the choices of defaultparameters rather than underlying differences between the two algorithms. WashingtonUniversity’s version, used with defaults, compares very favourably with the results obtainedusing the accurate but computationally intensive Smith–Waterman algorithm.
机译:对于生物学家来说,生物信息学工具对于了解大量的序列数据以及现在正以越来越高的速度产生的整个基因组已变得至关重要。这些序列数据中的大多数是单次通过序列,例如来自已经与其他感兴趣的生物紧密相关的生物的样品序列,或者是cDNA或表达的序列标签(EST)。这些单次通过序列软化包含错误,包括移码,这些错误使同源物的鉴定变得复杂,尤其是在蛋白质水平上。因此,经常在核苷酸水平上用这种类型的数据进行序列搜索。用于鉴定同源物的最常用的序列搜索算法是华盛顿大学和国家生物技术信息中心(NCBI)版本的BLAST工具套件,它们在世界各地的网站上都可以找到。本文报道的工作检验了使用这些工具将样品序列数据集与已知基因组进行比较。它表明选择用于BLAST算法的参数时必须小心。当使用华盛顿大学BLASTn版本的BLASTn(也允许有缺口)时,它们的默认对齐方式比NCBI版本短得多,有时甚至不同,这两种对齐方式都使用其默认参数。大多数性能差异是由于选择默认参数,而不是两种算法之间的根本差异。华盛顿大学的版本使用默认值,与使用精确但计算量大的Smith-Waterman算法获得的结果相比非常有利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号