...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Soft Ngram Representation and Modeling for Protein Remote Homology Detection
【24h】

Soft Ngram Representation and Modeling for Protein Remote Homology Detection

机译:蛋白质远程同源性检测的软Ngram表示和建模

获取原文
获取原文并翻译 | 示例
           

摘要

Remote homology detection represents a central problem in bioinformatics, where the challenge is to detect functionally related proteins when their sequence similarity is low. Recent solutions employ representations derived from the sequence profile, obtained by replacing each amino acid of the sequence by the corresponding most probable amino acid in the profile. However, the information contained in the profile could be exploited more deeply, provided that there is a representation able to capture and properly model such crucial evolutionary information. In this paper, we propose a novel profile-based representation for sequences, called soft Ngram. This representation, which extends the traditional Ngram scheme (obtained by grouping N consecutive amino acids), permits considering all of the evolutionary information in the profile: this is achieved by extracting Ngrams from the whole profile, equipping them with a weight directly computed from the corresponding evolutionary frequencies. We illustrate two different approaches to model the proposed representation and to derive a feature vector, which can be effectively used for classification using a support vector machine (SVM). A thorough evaluation on three benchmarks demonstrates that the new approach outperforms other Ngram-based methods, and shows very promising results also in comparison with a broader spectrum of techniques.
机译:远程同源性检测是生物信息学中的一个中心问题,其中的挑战是在序列相似性较低时检测功能相关的蛋白质。最近的解决方案采用了源自序列图谱的表示法,该图谱是通过用谱图中的相应最可能的氨基酸替换序列的每个氨基酸而获得的。但是,如果存在能够捕获并正确建模此类重要进化信息的表示形式,则可以更深入地利用配置文件中包含的信息。在本文中,我们提出了一种新颖的基于轮廓的序列表示形式,称为软Ngram。这种表示法扩展了传统的Ngram方案(通过对N个连续的氨基酸进行分组而获得),可以考虑配置文件中的所有进化信息:这是通过从整个配置文件中提取Ngrams,并为其配备了直接从Ngram中计算得出的权重来实现的。相应的进化频率。我们说明了两种不同的方法来对提出的表示进行建模并导出特征向量,这些方法可以有效地用于使用支持向量机(SVM)进行分类。对三个基准的全面评估表明,该新方法优于其他基于Ngram的方法,并且与更广泛的技术相比,还显示出非常有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号