首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences
【24h】

Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences

机译:排列和聚类模式揭示序列的蛋白质功能

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering sequence patterns with variations unveils significant functions of a protein family. Existing combinatorial methods of discovering patterns with variations are computationally expensive, and probabilistic methods require more elaborate probabilistic representation of the amino acid associations. To overcome these shortcomings, this paper presents a new computationally efficient method for representing patterns with variations in a compact representation called Aligned Pattern Cluster (AP Cluster). To tackle the runtime, our method discovers a shortened list of non-redundant statistically significant sequence associations based on our previous work. To address the representation of protein functional regions, our pattern alignment and clustering step, presented in this paper captures the conservations and variations of the aligned patterns. We further refine our solution to allow more coverage of sequences via extending the AP Clusters containing only statistically significant patterns to Weak and Conserved AP Clusters. When applied to the cytochrome c, the ubiquitin, and the triosephosphate isomerase protein families, our algorithm identifies the binding segments as well as the binding residues. When compared to other methods, ours discovers all binding sites in the AP Clusters with superior entropy and coverage. The identification of patterns with variations help biologists to avoid time-consuming simulations and experimentations. (Software available upon request).
机译:发现具有变异的序列模式揭示了蛋白质家族的重要功能。现有的发现具有变异的模式的组合方法在计算上是昂贵的,并且概率方法需要氨基酸关联的更精细的概率表示。为了克服这些缺点,本文提出了一种新的计算有效的方法,用于以紧凑的表示形式来表示具有变化的模式,称为对齐模式簇(AP簇)。为了解决运行时问题,我们的方法在以前的工作基础上,发现了较短的非冗余统计上有意义的序列关联列表。为了解决蛋白质功能区的代表问题,本文提出的我们的模式比对和聚类步骤捕获了比对模式的保守性和变异性。我们进一步完善了我们的解决方案,通过将仅包含统计上显着模式的AP群集扩展到弱和保守AP群集,从而允许更多的序列覆盖。当应用于细胞色素c,泛素和磷酸三糖异构酶蛋白家族时,我们的算法可识别结合片段以及结合残基。与其他方法相比,我们的方法发现了AP簇中所有具有强熵和覆盖率的结合位点。模式变化的识别有助于生物学家避免费时的模拟和实验。 (可根据要求提供软件)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号