...
首页> 外文期刊>Fuzzy sets and systems >Mining of protein-protein interfacial residues from massive protein sequential and spatial data
【24h】

Mining of protein-protein interfacial residues from massive protein sequential and spatial data

机译:从大量蛋白质序列和空间数据中挖掘蛋白质-蛋白质界面残留物

获取原文
获取原文并翻译 | 示例
           

摘要

It is a great challenge to process big data in bioinformatics. In this paper, we addressed the problem of identifying protein-protein interfacial residues from massive protein structural data. A protein set, comprising 154993 residues, was analyzed. We applied the three-dimensional alpha shape modeling to the search of surface and interfacial residues in this set, and adopted the spatially neighboring residue profiles to characterize each residue. These residue profiles, which revealed the sequential and spatial information of proteins, translated the original data into a large matrix. After vertically and horizontally refining this matrix, we comparably implemented a series of popular learning procedures, including neuro-fuzzy classifiers (NFCs), CART, neighborhood classifiers (NECs), extreme learning machines (ELMs) and naive Bayesian classifiers (NBCs), to predict the interfacial residues, aiming to investigate the sensitivity of these massive structural data to different learning mechanisms. As a consequence, ELMs, CART and NFCs performed better in terms of computational costs; NFCs, NBCs and ELMs provided favorable prediction accuracies. Overall, NFCs, NBCs and ELMs are favourable choices for fastly and accurately handling this type of data. More importantly, the marginal differences between the prediction performances of these methods imply the insensitivity of this type of data to different learning mechanisms.
机译:在生物信息学中处理大数据是一个巨大的挑战。在本文中,我们解决了从大量蛋白质结构数据中鉴定蛋白质-蛋白质界面残基的问题。分析了包含154993个残基的蛋白质组。我们将三维alpha形状建模应用于该组表面和界面残基的搜索,并采用空间相邻的残基轮廓来表征每个残基。这些残基图谱揭示了蛋白质的顺序和空间信息,将原始数据转换成大矩阵。在垂直和水平细化此矩阵之后,我们比较地实施了一系列流行的学习程序,包括神经模糊分类器(NFC),CART,邻域分类器(NEC),极限学习机(ELM)和朴素贝叶斯分类器(NBC),预测界面残留物,旨在研究这些大量结构数据对不同学习机制的敏感性。结果,ELM,CART和NFC在计算成本方面表现更好; NFC,NBC和ELM提供了有利的预测准确性。总体而言,NFC,NBC和ELM是快速而准确地处理此类数据的理想选择。更重要的是,这些方法的预测性能之间的边际差异意味着此类数据对不同的学习机制不敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号