首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Heuristic Bayesian Segmentation for Discovery of Coexpressed Genes within Genomic Regions
【24h】

Heuristic Bayesian Segmentation for Discovery of Coexpressed Genes within Genomic Regions

机译:用于基因组区域内共表达基因发现的启发式贝叶斯分割

获取原文
获取原文并翻译 | 示例
           

摘要

Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
机译:分割的目的是从顺序数据中分离出同类区域,并在数据挖掘中发挥核心作用。它的应用范围从金融到分子生物学,其中生物信息学任务(例如基因组数据分析)是活跃的应用领域。在本文中,我们提出了分割在定位具有共表达基因的基因组区域中的新应用。我们旨在自动发现此类区域,而无需用户提供参数。为了在合理的时间内执行细分,我们使用了启发式方法。大多数启发式分割算法都需要对段数做出一些决定。这通常通过使用渐近模型选择方法(如贝叶斯信息准则)来完成。此类方法基于一些简化,可能会限制其使用。在本文中,我们提出了一种贝叶斯模型选择,以从启发式分割中选择最合适的结果。我们的贝叶斯模型为具有各种段号的细分解决方案提供了简单的先验条件,为模型化多项式数据提供了改进的Dirichlet先验条件。我们在基准系统中通过各种人工数据集表明,我们的模型选择标准具有最佳的整体性能。我们的方法在酵母细胞周期基因表达数据中的应用揭示了基因组的潜在主动和被动区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号