...
首页> 外文期刊>BMC Genomics >Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns
【24h】

Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns

机译:临界扫描矩阵(CSM):通过蛋白质残基间距离模式进行结构分类和功能预测

获取原文
           

摘要

BackgroundThe unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75.ResultsCSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level.ConclusionsWe showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.
机译:背景技术可利用的生物数据增长速度令人难以忍受,这增加了对自动标注的有效和可扩展范式,模型和方法的需求。在本文中,我们提出了一种新的基于结构的蛋白质功能预测和结构分类方法:Cutoff Scanning Matrix(CSM)。 CSM生成特征向量,这些特征向量表示蛋白质残基之间的距离模式。然后将这些特征向量用作分类的证据。奇异值分解用作减少尺寸和噪声的预处​​理步骤。在本工作中考虑的蛋白质功能方面是酶活性。在基于酶委员会(EC)编号和机制不同的酶超家族以及SCOP版本1.75衍生的其他数据集的数据集上进行了一系列实验。结果CSM在对数据库进行SVD​​预处理后能够实现高达99%的精度衍生自手动处理的蛋白质超家族,对于950个人口最多的EC数据集,其数据高达95%。此外,我们进行了实验以验证我们将SCOP类别,超家族,家族和折叠分配给蛋白质结构域的能力。使用最新SCOP版本中发现的整个域进行的实验产生了很高的精确度和查全率(高达95%)。最后,我们将结构分类结果与文献中的结果进行了比较,以将这项工作置于上下文中。我们的方法能够在保持兼容的精确度水平的同时显着改善先前研究的回忆性。结论我们证明,源自CSM的模式可以有效地用于预测蛋白质功能,从而帮助自动标注功能。我们还证明了我们的方法在结构分类任务中是有效的。这些事实进一步证明了残基间距离的模式是家庭结构特征的重要组成部分。此外,奇异值分解可不断提高精度和查全率,这使其成为处理嘈杂数据时的重要预处理步骤。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号