Filtering redundancies for sequence similarity search programs.

Cantalloube H; Chomilier J; Chiusa S; Lonquety M; Spadoni JL; Zagury JF

首页> 外文期刊>Journal of Biomolecular Structure and Dynamics >Filtering redundancies for sequence similarity search programs.

【24h】

Filtering redundancies for sequence similarity search programs.

机译：过滤序列相似性搜索程序的冗余。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post-genomic processing of DNA or protein sequence information (in particular to retrieve the structure/function of uncharacterized proteins). Unfortunately, their results can be polluted by identical alignments (called redundancies) coming from the same protein or DNA sequences present in different entries of the database. This makes the efficient use of the listed alignments difficult. Pretreatment of databases has been proposed to suppress strictly identical entries. However, there still remain many identical alignments since redundancies may occur locally for entries corresponding to various fragments of the same sequence or for entries corresponding to very homologous sequences but differing at the level of a few residues such as ortholog proteins. In the present work, we show that redundant alignments can be indeed numerous even when working with a pretreated non-redundant data bank, going as high as 60% of the output results according to the query and the bank. Therefore the accuracy and the efficiency of the post-genomic work will be greatly increased if these redundancies are removed. To solve this up to now unaddressed problem, we have developed an algorithm that allows for the efficient and safe suppression of all the redundancies with no loss of information. This algorithm is based on various filtering steps that we describe here in the context of the Automat similarity search program, and such an algorithm should also be added to the other similarity search programs (BLAST, FASTA, etc...).

机译：如今，大多数生物学家都使用数据库扫描程序（例如BLAST和FASTA）来对DNA或蛋白质序列信息进行基因组后处理（特别是检索未表征蛋白质的结构/功能）。不幸的是，它们的结果可能会受到来自数据库不同条目中存在的相同蛋白质或DNA序列的相同比对（称为重复）的污染。这使得有效使用列出的比对变得困难。已经提出对数据库进行预处理以抑制严格相同的条目。然而，仍然存在许多相同的比对，因为对于对应于相同序列的各个片段的条目或对应于非常同源的序列但在一些残基如直向同源蛋白水平上不同的条目，冗余可能局部发生。在当前的工作中，我们表明，即使使用经过预处理的非冗余数据库，冗余对齐的确可以实现很多，根据查询和存储库，高达高达60％的输出结果。因此，如果消除了这些冗余，基因组后工作的准确性和效率将大大提高。为了解决目前为止尚未解决的问题，我们开发了一种算法，该算法可在不丢失信息的情况下有效，安全地抑制所有冗余。该算法基于我们在Automat相似性搜索程序的上下文中在此描述的各种过滤步骤，并且还应将这种算法添加到其他相似性搜索程序（BLAST，FASTA等）中。

著录项

来源
《Journal of Biomolecular Structure and Dynamics》 |2005年第4期|共6页
作者
Cantalloube H; Chomilier J; Chiusa S; Lonquety M; Spadoni JL; Zagury JF;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Algorithms; Databases; Factual; Sequence Alignment; 算法; 数据库; 事实型; 序列排列;

机译：Algorithms;Databases;Factual;Sequence Alignment;算法;数据库;事实型;序列排列;

相似文献

外文文献
中文文献
专利

1. Filtering redundancies for sequence similarity search programs. [J] . Cantalloube H, Chomilier J, Chiusa S, Journal of Biomolecular Structure and Dynamics . 2005,第4期

机译：过滤序列相似性搜索程序的冗余。
2. On the self-similarity of 1/f~β sequences synthesized by recursive filtering [J] . Mohamed Reda Lakehal, Youcef Ferdi, Abdelmalik Taleb-Ahmed Computers and Electrical Engineering . 2012,第2期

机译：递归滤波合成1 / f〜β序列的自相似性
3. A parallel computational approach for similarity search using Bloom filters [J] . Chauhan Sachendra Singh, Batra Shalini Computational Intelligence . 2018,第2期

机译：使用Bloom过滤器进行相似度搜索的并行计算方法
4. Effective indexing and filtering for similarity search in large biosequence databases [C] . Ozturk, O., Ferhatosmanoglu, . 2003

机译：有效索引和过滤，可在大型生物序列数据库中进行相似性搜索
5. Sequence and structure similarity search in biological and XML databases. [D] . Aghili, S. Alireza. 2005

机译：生物和XML数据库中的序列和结构相似性搜索。
6. A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters [O] . Chunyan Shuai, Hengcheng Yang, Xin Ouyang, 2016

机译：基于并行布隆过滤器的新型精度和相似度搜索结构
7. Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases [O] . Ozgur Ozturk, Hakan Ferhatosmanoglu 2003

机译：大型生物序列数据库中用于相似性搜索的有效索引和过滤
8. Statistical Properties of Filtered Pseudorandom Digital Sequences Formed from the Sum of Maximum-Length Sequences [R] . Wallace, G. R., Weathers, G. D., Graf, E. R. 1973

机译：从最大长度序列之和形成的滤波伪随机数字序列的统计特性

Filtering redundancies for sequence similarity search programs.

摘要

著录项

相似文献

相关主题

期刊订阅