...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Sequence-based enzyme catalytic domain prediction using clustering and aggregated mutual information content
【24h】

Sequence-based enzyme catalytic domain prediction using clustering and aggregated mutual information content

机译:基于聚类和聚集互信息内容的基于序列的酶催化域预测

获取原文
获取原文并翻译 | 示例
           

摘要

Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.
机译:表征酶序列并鉴定其活性位点是非常重要的任务。当前的实验方法过于昂贵且劳动强度大,无法处理迅速积累的蛋白质序列和结构数据。因此,非常需要用于识别催化残基和酶功能预测的准确,高通量的计算机方法。在本文中,我们提出了一种使用序列聚类和信息理论方法的基于序列的新型催化域预测方法。第一步是对相同功能类别(具有相同EC标签的酶)的酶序列进行序列聚类分析。聚类分析用于处理酶序列中序列相似性水平差异很大的问题。聚类分析构建了一个序列图,其中节点是酶序列,边缘是一对具有一定程度的序列相似性的序列,并使用图属性(例如,双连接的组件和铰接点)来生成酶序列共有的序列片段。然后,对共有共享区域中的氨基酸亚序列进行比对,然后执行称为集合列相关评分方案的信息理论方法,以突出显示酶序列中的潜在活性位点。汇总信息内容评分方案显示出有效地突出显示活性位点残基的效果。所提出的将聚类和汇总信息内容评分方法相结合的方法成功地在Catalytic Site Atlas数据库中突出显示了大肠杆菌K12酶中的已知催化位点。我们的方法显示出不仅可以准确预测酶序列中潜在的活性位点,而且计算效率高,这是因为聚类方法利用了两个图属性,可以对序列图中的边数进行线性计算并计算互信息不需要很多时间。我们认为,所提出的方法可用于从许多基因组计划中鉴定酶序列的活性位点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号