...
首页> 外文期刊>Expert systems with applications >Association classification algorithm based on structure sequence in protein secondary structure prediction
【24h】

Association classification algorithm based on structure sequence in protein secondary structure prediction

机译:蛋白质二级结构预测中基于结构序列的关联分类算法

获取原文
获取原文并翻译 | 示例
           

摘要

Objective: To propose a novel associate classification algorithm SAC (structural association classification) and develop a compound pyramid model for accurate and precise protein secondary structure prediction. Method: Based on the slide window theory, the protein sequence was treated as a window with length of 13, in which the target amino acid resided in the center, while the remaining area was targeted as secondary amino acid structures. To the head and tail of the sequence, the mirror method was employed to fill the space with an opposite- position structure in relation to the central position. In the mining process, the KDD' model not only focuses on the high support and confidence rules, but also pay attention to high confidence and low support rules, which is called 'knowledge in shortage'. Towards the end of the mining process, sets H, E and C., consisted of rule sets whose consequents are a-helix, β-sheet and C-coil, were created respectively to meet the basic requirements for the protein secondary structure prediction. The knowledge base of protein secondary structure was then established with these three newly-acquired rule sets. Through the CMAR (Classification based on Multiple Association rules) algorithm, a novel multi-classifier was developed to determine the best likelihood of a given window to the secondary structure through the adjacent information on amino acid sequential window and screening of three different rule sets.rnResult: The protein knowledge base consisted of 8049 rules corresponding to sets H, E and C with 2642, 1895 and 3512 rules, respectively, was obtained. Experiment shows, theoretically, accuracy ratio exceeded 85% when confidence threshold value was 70% and 90%. Through the classification process using the multi-classifier SAC developed in four experiments, the significantly high accuracy and recall ratios up to 83.06% (According to Q_3 criterion, followed by abbreviation) in RS126 (Chen & Chaudhari, 2007; Guo et al., 2004; Hu et al., 2004; Liu et al., 2004) and 80.49% in CB513 (Guo et al., 2004; Liu et al., 2004; Wang & Liu (2004)), respectively, were demonstrated.rnConclusion: The structural association classification algorithm with pyramid classification developed in the present study demonstrated significantly high accuracy in the protein secondary structure prediction. The study results suggest a highly reliable and accurate alternative in the contemporary protein structure prediction.
机译:目的:提出一种新的关联分类算法SAC(结构关联分类),并开发一种用于准确,精确地预测蛋白质二级结构的复合金字塔模型。方法:根据滑动窗口理论,将蛋白质序列视为长度为13的窗口,目标氨基酸位于中央,其余区域作为二级氨基酸结构。对于序列的头部和尾部,采用镜像方法以相对于中心位置的相对位置结构填充空间。在采矿过程中,KDD模型不仅关注高支持度和置信度规则,还关注高置信度和低支持度规则,这被称为“知识短缺”。在采矿过程即将结束时,分别创建了规则集(分别为a螺旋,β折叠和C线圈)组成的H,E和C.组,以满足蛋白质二级结构预测的基本要求。然后使用这三个新获得的规则集建立蛋白质二级结构的知识库。通过CMAR(基于多重关联规则的分类)算法,开发了一种新颖的多重分类器,以通过氨基酸顺序窗口上的相邻信息并筛选三个不同的规则集来确定给定窗口对二级结构的最佳可能性。结果:获得的蛋白质知识库由8049条规则组成,分别对应于H,E和C组,分别具有2642、1895和3512条规则。实验表明,从理论上讲,置信度阈值为70%和90%时,准确率超过85%。通过在四个实验中开发的使用多分类器SAC的分类过程,RS126中的准确率和查全率都达到了83.06%(根据Q_3准则,其后为缩写)(Chen&Chaudhari,2007; Guo等,)。 2004; Hu等,2004; Liu等,2004)和CB513中的80.49%(Guo等,2004; Liu等,2004; Wang&Liu(2004))。 :本研究开发的具有金字塔分类的结构关联分类算法在蛋白质二级结构预测中显示出极高的准确性。研究结果表明,在当代蛋白质结构预测中,这是高度可靠和准确的替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号