...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Mining Contiguous Sequential Generators in Biological Sequences
【24h】

Mining Contiguous Sequential Generators in Biological Sequences

机译:挖掘生物序列中的连续序列发生器

获取原文
获取原文并翻译 | 示例
           

摘要

The discovery of conserved sequential patterns in biological sequences is essential to unveiling common shared functions. Mining sequential generators as well as mining closed sequential patterns can contribute to a more concise result set than mining all sequential patterns, especially in the analysis of big data in bioinformatics. Previous studies have also presented convincing arguments that the generator is preferable to the closed pattern in inductive inference and classification. However, classic sequential generator mining algorithms, due to the lack of consideration on the contiguous constraint along with the lower-closed one, still pose a great challenge at spawning a large number of inefficient and redundant patterns, which is too huge for effective usage. Driven by some extensive applications of patterns with contiguous feature, we propose ConSgen, an efficient algorithm for discovering contiguous sequential generators. It adopts the n-gram model, called shingles, to generate potential frequent subsequences and leverages several pruning techniques to prune the unpromising parts of search space. And then, the contiguous sequential generators are identified by using the equivalence class-based lower-closure checking scheme. Our experiments on both DNA and protein data sets demonstrate the compactness, efficiency, and scalability of ConSgen.
机译:生物序列中保守序列模式的发现对于揭示共同的共享功能至关重要。与挖掘所有顺序模式相比,挖掘顺序生成器以及挖掘封闭的顺序模式可以使结果集更加简洁,尤其是在生物信息学中分析大数据时。先前的研究也提出了令人信服的论点,即在归纳推理和分类中,生成器比封闭模式更可取。但是,经典的顺序生成器挖掘算法由于缺乏对连续约束以及闭包的约束的考虑,因此在产生大量无效和冗余的模式方面仍然面临着巨大的挑战,这对于有效使用来说太大了。在具有连续特征的模式的一些广泛应用的推动下,我们提出了ConSgen,这是一种用于发现连续序列发生器的有效算法。它采用称为盖瓦的n-gram模型来生成潜在的频繁子序列,并利用几种修剪技术来修剪搜索空间中没有希望的部分。然后,使用基于等价类的下层关闭检查方案识别连续的顺序生成器。我们在DNA和蛋白质数据集上的实验证明了ConSgen的紧凑性,效率和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号