首页> 外文学位 >Combinatorial and statistical approaches for some challenges in oligonucleotide fingerprinting on ribosomal RNA genes.
【24h】

Combinatorial and statistical approaches for some challenges in oligonucleotide fingerprinting on ribosomal RNA genes.

机译:核糖体RNA基因寡核苷酸指纹识别中的一些挑战的组合和统计方法。

获取原文
获取原文并翻译 | 示例

摘要

Oligonucleotide fingerprinting of ribosomal RNA genes (OFRG) is a high-throughput, cost-effective, array-based method designed to identify microorganisms. During the development of the OFRG method, various computational challenges have arisen. In this work, we present some combinatorial and statistical solutions for several critical problems in OFRG.; The first problem is a sequence acquisition problem whose goal is to obtain an rRNA gene sequence database for a specific taxonomic group. In the proposed combinatorial approach, a fast and accurate approximate string-matching algorithm was designed to fetch rRNA gene sequences sandwiched by two given primers from GenBank. A homology search algorithm, which combines a chaining algorithm with the Basic Local Alignment Search Tool (BLAST), was then used to extract rRNA gene sequences that do not contain the primers. An improved string-matching algorithm, called Fast Algorithm for Approximate String maTching (FAAST), was further developed for the approximate string-matching problem. FAAST generalizes the well-known Tarhio-Ukkonen algorithm by requiring two or more matches when calculating shift distances. Both theoretical analysis and experimental results demonstrate a significant speed-up without loss of accuracy achieved by the algorithm.; The second challenge arises in the analysis of microarray data. In OFRG, the presence of specific rRNA gene sequences are determined by the intensity values of hybridization with a series of oligonucleotide probes. Due to noise and technological limitations, these intensity values are sometimes too ambiguous for a reliable classification. In such a situation, the traditional Bayes classification method could lead to an invalid prediction, affecting the accuracy of OFRG. A statistical model called Modified Bayes Rule (MBR) was proposed to allow a "no prediction." MBR formulated a cost structure to weigh the penalty for not making a definite prediction against that for making an incorrect definite prediction. Experiments demonstrated that MBR outperforms a neutral-zone rule that has been routinely used before in OFRG.; Finally, software packages that implement the above algorithms and other related methods were developed. A central database was also designed to serve as the central management of data from OFRG.
机译:核糖体RNA基因(OFRG)的寡核苷酸指纹图谱是一种高通量,经济高效的基于阵列的方法,旨在鉴定微生物。在OFRG方法的开发过程中,出现了各种计算难题。在这项工作中,我们提出了OFRG中几个关键问题的一些组合和统计解决方案。第一个问题是序列获取问题,其目的是获得特定分类组的rRNA基因序列数据库。在提出的组合方法中,设计了一种快速,准确的近似字符串匹配算法,以从GenBank中获取由两个给定引物夹在中间的rRNA基因序列。然后使用同源性搜索算法,该方法将链接算法与基本局部比对搜索工具(BLAST)结合在一起,用于提取不包含引物的rRNA基因序列。针对近似字符串匹配问题,进一步开发了一种改进的字符串匹配算法,称为“近似字符串匹配快速算法”(FAAST)。 FAAST通过在计算换档距离时需要两个或多个匹配项来概括众所周知的Tarhio-Ukkonen算法。理论分析和实验结果均表明,该算法可显着提高速度,而不会降低算法的精度。第二个挑战出现在微阵列数据分析中。在OFRG中,特定rRNA基因序列的存在取决于与一系列寡核苷酸探针杂交的强度值。由于噪声和技术限制,这些强度值有时对于可靠的分类来说太含糊。在这种情况下,传统的贝叶斯分类方法可能导致无效的预测,从而影响OFRG的准确性。提出了一种称为修正贝叶斯规则(MBR)的统计模型,以允许“无预测”。 MBR制定了一种成本结构,以权衡因未做出确定的预测与作出不正确的确定的成本而造成的损失。实验表明,MBR优于OFRG中以前常规使用的中性区规则。最后,开发了实现上述算法和其他相关方法的软件包。还设计了一个中央数据库,用作OFRG数据的中央管理。

著录项

  • 作者

    Liu, Zheng.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号