首页> 外文期刊>In silico biology: An international on computational biology >Analysis of n-Gram based Promoter Recognition Methods and Application to Whole Genome Promoter Prediction
【24h】

Analysis of n-Gram based Promoter Recognition Methods and Application to Whole Genome Promoter Prediction

机译:基于n-Gram的启动子识别方法分析及其在全基因组启动子预测中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Promoter prediction is an important and complex problem. Pattern recognition algorithms typically require features that could capture this complexity. A special bias towards certain combinations of base pairs in the promoter sequences may be possible. In order to determine these biases n-grams are usually extracted and analyzed. An n-gram is a selection of n contiguous characters from a given character stream, DNA sequence segments in this case. Here a systematic study is made to discover the efficacy of n-grams for n = 2, 3, 4, 5 in promoter prediction. A study of n-grams as features for a neural network classifier for E. coli and Drosophila promoters is made. In case of E. coli n = 3 and in case of Drosophila n = 4 seem to give optimal prediction values. Using the 3-gram features, promoter prediction in the genome sequence of E. coli is done. The results are encouraging in positive identification of promoters in the genome compared to software packages such as BPROM,NNPP, and SAK. Whole genome promoter prediction in Drosophila genome was also performed but with 4-gram features.
机译:启动子预测是一个重要而复杂的问题。模式识别算法通常需要可以捕获这种复杂性的功能。对启动子序列中碱基对的某些组合的特殊偏向可能是可能的。为了确定这些偏差,通常会提取和分析n-gram。 n-gram是从给定字符流中选择n个连续字符,在这种情况下为DNA序列段。在这里进行了系统的研究,以发现n = 2、3、4、5的n-gram在启动子预测中的功效。研究了n-gram作为大肠杆菌和果蝇启动子的神经网络分类器的特征。在大肠杆菌中,n = 3,在果蝇中,n = 4似乎能提供最佳预测值。使用3克特征,可以完成大肠杆菌基因组序列中的启动子预测。与软件包如BPROM,NNPP和SAK相比,在基因组中启动子的阳性鉴定方面,结果令人鼓舞。果蝇基因组中的全基因组启动子预测也进行了,但具有4克特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号