...
首页> 外文期刊>BMC Genomics >Bayesian prediction of bacterial growth temperature range based on genome sequences
【24h】

Bayesian prediction of bacterial growth temperature range based on genome sequences

机译:基于基因组序列的细菌生长温度范围的贝叶斯预测

获取原文
           

摘要

Background The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using na?ve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions. Conclusions This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the na?ve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.
机译:背景技术给定细菌的优选栖息地可以提示其可能产生潜在工业兴趣的酶的类型。这些可能包括在非常高或非常低的温度下稳定且有活性的酶。能够基于基因组序列准确地预测这一点,因此可以有效,有针对性地搜寻生产生物,从而减少了培养实验的需要。结果本研究发现共有40个蛋白家族可用于区分三种嗜热性类别(嗜热菌,嗜温菌和嗜冷菌)。将这些蛋白质家族的预测性能与87个基本序列特征(氨基酸和密码子的相对使用,基因组和16S rDNA AT含量和基因组大小)进行了比较。当使用朴素贝叶斯推断时,有可能正确地预测最佳温度范围,其马修斯相关系数最高为0.68。与单独的蛋白质家族和结构特征相比,总是通过包含蛋白质家族和结构特征来实现最佳的预测性能。创建了专用的计算机程序来执行这些预测。结论这项研究表明,与特定嗜热性类别相关的蛋白质家族可以为嗜热性预测提供有效的输入数据,并且朴素的贝叶斯方法可以有效地完成这一任务。为该研究创建的程序能够有效地区分嗜热,嗜温和嗜冷的细菌基因组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号