首页> 外文会议>International Conference on Bioinformatics and Biomedicine Workshops >A Novel Quasi-Alignment-Based Method for Discovering Conserved Regions in Genetic Sequences
【24h】

A Novel Quasi-Alignment-Based Method for Discovering Conserved Regions in Genetic Sequences

机译:一种基于准取向的遗传序列中的保守区域的新型对准方法

获取原文

摘要

This paper presents an alignment-free technique to efficiently discover similar regions in large sets of biological sequences using position sensitive p-mer frequency clustering. A set of sequences is broken down into segment and then a frequency distribution over all oligomers of size p (referred to as p-mers) is obtained to summarize each segment. These summaries are clustered while the order of segments in the set of sequences is preserved in a Markov-type model. Sequence segments within each cluster have very similar DNA/RNA patterns and form a so called quasi-alignment. This fact can be used for a variety of tasks such as species characterization and identification, phylogenetic analysis, functional analysis of sequences and, as in this paper, for discovering conserved regions. Our method is computationally more efficient than multiple sequences alignment since it can apply modern data stream clustering algorithms which run in time linear in the number of segments and thus can help discover highly similar regions across a large number of sequences efficiently. In this paper, we apply the approach to efficiently discover and visualize conserved regions in 16S rRNA.
机译:本文呈现了一种可对准的技术,可以使用位置敏感P-MER频率聚类有效地发现大型生物序列中的类似区域。将一组序列分解为片段,然后获得频率分布在尺寸P的所有低聚物(称为P-MERS)上总结每个段。这些摘要是群集的,而在Markov型模型中保留了该组序列集中的段的顺序。每个簇内的序列段具有非常相似的DNA / RNA图案,形成所谓的准取向。这一事实可用于各种任务,例如物种表征和鉴定,系统发育分析,序列功能分析,如本文用于发现保守区。我们的方法是多于多个序列对准的更有效,因为它可以应用现代数据流聚类算法,该算法在段的数量中运行时间线性,因此可以有助于有效地在大量序列上发现高度相似的区域。在本文中,我们应用了16S rRNA中有效地发现和可视化保守区域的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号