首页> 外文学位 >Improvement of ab initio methods of gene prediction in genomic and metagenomic sequences.
【24h】

Improvement of ab initio methods of gene prediction in genomic and metagenomic sequences.

机译:基因组和宏基因组序列中从头开始的基因预测方法的改进。

获取原文
获取原文并翻译 | 示例

摘要

A metagenome originated from a shotgun sequencing of a microbial community is a heterogeneous mixture of rather short sequences. A vast majority of microbial species in a given community (99%) are likely to be non-cultivable. Many protein-coding regions in a new metagenome are likely to code for barely detectable homologs of already known proteins. Therefore, an ab initio method that would accurately identify the new genes is a vitally important tool of metagenomic sequence analysis. The standard tools for ab initio prokaryotic gene prediction such as EasyGene, GeneMarkS or Glimmer were not designed to work with short sequence fragments from unknown genomes. However, a heuristic model method for finding genes in short prokaryotic sequences with anonymous origin was proposed in 1999 prior to the advent of metagenomics.;The idea was to bypass traditional ways of parameter estimation such as supervised training on a set of validated genes or unsupervised training on an anonymous sequence supposed to contain a large enough number of genes. It was proposed to use dependencies between the codon frequencies and the genome nucleotide composition. In this way, the codon frequencies, critical for the model parameterization, could be derived from frequencies of nucleotides observed in the short sequence.;With hundreds of new prokaryotic genomes available it is now possible to enhance the original approach and to utilize direct polynomial and logistic approximations of oligonucleotide frequencies. This method could be further applied for initializing the algorithms for iterative parameters estimation for prokaryotic as well as eukaryotic gene finders.;The research of this dissertation contributed to the following publications: (1) Zhu W., Lomsadze A. and Borodovsky M. (2010). ab initio Gene Identification in Metagenomic Sequences. Accepted, Nucleic Acids Research. (2) Martin J., Zhu W., Bergman N. and Borodovsky M. (2009). Assessment of Gene Annotation Accuracy by Inferring Transcripts from RNA-Seq. BIBM 2009: 54--59. (3) Martin J., Zhu W., Passalacqua K., Bergman N. and Borodovsky M. (2010). Bacillus anthracis genome organization in light of whole transcriptome sequencing. BMC Bioinformatics 2010, 11(Suppl 3):S10. (4) Zhu W., Lomsadze A. and Borodovsky M. GeneMarkS Plus: Improving gene annotation in complete prokaryotic genomes. In Preparation. (5) Bakkeren G., Zhu W., Antonov I. and Borodovsky M. Gene prediction in Puccinia triticina based on EST data. In Preparation.
机译:源自a弹枪测序的微生物群落的元基因组是相当短序列的异质混合物。给定社区中的绝大多数微生物物种(99%)可能是不可耕种的。一个新的基因组中的许多蛋白质编码区很可能编码已知蛋白质的几乎无法检测到的同源物。因此,从头算方法可以准确识别新基因,是宏基因组序列分析的重要工具。从头开始进行原核基因预测的标准工具,例如EasyGene,GeneMarkS或Glimmer,并未设计为与未知基因组的短序列片段配合使用。然而,在宏基因组学问世之前,1999年提出了一种启发式模型方法,以寻找具有匿名起源的短原核生物序列中的基因。该想法是绕过传统的参数估计方法,例如对一组经过验证的基因进行有监督的训练或在无监督的情况下进行有监督的训练训练一个应该包含足够多基因的匿名序列。提出使用密码子频率和基因组核苷酸组成之间的依赖性。这样,对于模型参数化而言至关重要的密码子频率可以从短序列中观察到的核苷酸频率中得出。随着数百种新的原核生物基因组的可用,现在有可能增强原始方法并利用直接多项式和寡核苷酸频率的逻辑对数。该方法可进一步用于初始化原核及真核基因发现者的迭代参数估计算法。;本论文的研究为以下出版物做出了贡献:(1)Zhu W.,Lomsadze A. and Borodovsky M.( 2010)。从头开始进行基因组序列基因鉴定。接受,核酸研究。 (2)Martin J.,Zhu W.,Bergman N.和Borodovsky M.(2009)。通过从RNA序列中推断转录本来评估基因注释的准确性。 BIBM 2009:54--59。 (3)Martin J.,Zhu W.,Passalacqua K.,Bergman N.和Borodovsky M.(2010年)。根据整个转录组测序,炭疽芽孢杆菌基因组的组织。 BMC Bioinformatics 2010,11(增刊3):S10。 (4)Zhu W.,Lomsadze A.和Borodovsky M. GeneMarkS Plus:在完整的原核基因组中改善基因注释。在准备。 (5)Bakkeren G.,Zhu W.,Antonov I.和Borodovsky M.基于EST数据的小麦锈病基因预测。在准备。

著录项

  • 作者

    Zhu, Wenhan.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 176 p.
  • 总页数 176
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号