首页> 外文期刊>Standards in Genomic Sciences >Toward a standard in structural genome annotation for prokaryotes
【24h】

Toward a standard in structural genome annotation for prokaryotes

机译:建立原核生物的结构基因组注释标准

获取原文
           

摘要

Background In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, 1,004,576 peptides were collected from various publicly available resources, and were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31?% to 74?%, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. Results We found that the consensus set of identical genes predicted by the three methods constitutes only about 70?% of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. Conclusions A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of exprimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.
机译:背景技术为了确定在原核基因组中寻找基因的最佳实践,并将其作为自动注释流程的标准,从各种可公开获得的资源中收集了1,004,576个肽段,并将其用作评估各种基因调用方法的基础。这些肽来自45个细菌复制子,平均GC含量在31%至74%之间,偏向更高的GC含量基因组。自动化,手动和半手动方法用于对三种广泛使用的基因调用方法进行错误计数,这一点可以通过映射到被调用基因边界之外的肽来证明。结果我们发现,用这三种方法预测的相同基因的共有序列仅构成每种方法预测的基因的约70%(开始和终止必须一致)。肽数据可用于评估基因调用者之间的某些差异,但由于任何蛋白质组学研究固有的局限性,因此其可靠性不足以得出结论。结论由于可利用的蛋白质组学数据不足以客观地衡量这两种方法之间的差异,因此该分析并未得出单一,明确,一致的最佳实践。但是,由于这项研究的结果,参与者之间的软件,参考数据和程序得到了更好的匹配,代表了朝着迫切需要的标准迈进的一步。在没有足够数量的实验数据来达到通用标准的情况下,我们的建议是,只要在要比较的所有数据集中采用一种方法,社区就可以使用这些方法中的任何一种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号