首页> 外文会议>Knowledge-Based Systems for Safety Critical Applications >Fast alignment of large genome databases: a demonstration
【24h】

Fast alignment of large genome databases: a demonstration

机译:大型基因组数据库的快速比对:演示

获取原文
获取原文并翻译 | 示例

摘要

We demonstrate an efficient algorithm for alignment of large genome strings. Our algorithm constructs a Boolean match table for a given query string and database string with the help of the MRS index structure. The size of the MRS index structure is approximately 1-2% of that of database. Each entry of the match table corresponds to a query/database substring pair. An entry in the match table is marked as True if the corresponding query substring and database substring potentially contain similar patterns. It is marked as False otherwise. The size of the match table is negligible compared to that of database. Once the match table is computed, we build hash tables on these strings. Once the hash table of a string is constructed the marked substrings of other string are read sequentially and exactly matching substrings of the prespecified size are found using this hash table. We call this technique MAP (match table based pruning). Experimental results show that MAP runs up to 97 times faster than BLAST.
机译:我们展示了一种用于大型基因组字符串比对的有效算法。我们的算法借助MRS索引结构为给定的查询字符串和数据库字符串构造布尔匹配表。 MRS索引结构的大小约为数据库大小的1-2%。匹配表的每个条目都对应一个查询/数据库子字符串对。如果相应的查询子字符串和数据库子字符串可能包含相似的模式,则匹配表中的条目会标记为True。否则将其标记为False。与数据库相比,匹配表的大小可以忽略不计。计算完匹配表后,我们将在这些字符串上构建哈希表。一旦构造了一个字符串的哈希表,便会依次读取其他字符串的标记子字符串,并使用此哈希表找到与预定义大小完全匹配的子字符串。我们称这种技术为MAP(基于匹配表的修剪)。实验结果表明,MAP运行速度比BLAST快97倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号