首页> 美国卫生研究院文献>Advances in Bioinformatics >IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction
【2h】

IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

机译:IN-MACA-MCC:具有修饰的克隆分类器的集成多吸引子细胞自动机用于人类蛋白编码和启动子预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.
机译:蛋白质编码和启动子区域预测是生物信息学非常重要的挑战(Attwood和Teresa,2000)。这些区域的鉴定在理解基因中起着至关重要的作用。引入了许多新颖的计算和数学方法,以及对现有方法进行了改进以分别预测两个区域。仍有改进的余地。我们提出了一种基于MACA(多个吸引细胞自动机)和MCC(改进的克隆分类器)的分类器,以使用单个分类器预测两个区域。拟议的分类器使用Fickett和Tung(1992)数据集进行训练和测试,以预测长度为54、108和162的DNA序列的蛋白质编码区域。该分类器使用MMCRI数据集进行训练和测试,以预测蛋白质的DNA序列的编码区域。长度分别为252和354。使用来自DBTSS(​​Yamashita等人,2006)数据集的启动子序列以及来自EID(Saxonov等人,2000)和UTRdb(Pesole等人,2002)数据集的非启动子对建议的分类器进行训练和测试。提出的模型可以预测两个区域,启动子和蛋白编码区域预测的平均准确度分别为90.5%和89.6%。启动子和蛋白质编码区预测的特异性和敏感性值分别为0.89和0.92。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号