【24h】

Fast compression of huge DNA sequence data

机译:快速压缩巨大的DNA序列数据

获取原文
获取原文并翻译 | 示例

摘要

DNA sequences can be enormous in size. There have been several DNA sequence oriented compression methods like Biocompress, DNACompress, Cfact, CTW+LZ, and DNADP. These compression methods can achieve high compression ratio, but sacrifice too much of time. For example, CTW+LZ takes several hours to compress a sequence HEMCMVCG of 227 KB. DNADP takes about 20 minutes to compress standard benchmark sequences. Here we introduce an improved RLE method, which has lower computation complex. Thus, it significantly improves the running time against previous DNA compression programs. Our improved LRE can achieve compression ratio of 1.862 bits per base. It only takes about 1 minute on a 2.1 GHz Core 2 duo processor to compress a 250MB chromosomes sequence file. And we use the Delta Encoding to reduce the second sequence to 4.8MB.
机译:DNA序列的大小可能很大。已经有几种面向DNA序列的压缩方法,例如Biocompress,DNACompress,Cfact,CTW + LZ和DNADP。这些压缩方法可以实现较高的压缩率,但会浪费太多时间。例如,CTW + LZ需要几个小时才能压缩227 KB的序列HEMCMVCG。 DNADP大约需要20分钟来压缩标准基准序列。在这里,我们介绍一种改进的RLE方法,它具有较低的计算复杂度。因此,与以前的DNA压缩程序相比,它显着提高了运行时间。我们改进的LRE可以实现每基1.862位的压缩率。在2.1 GHz Core 2 duo处理器上,压缩250MB染色体序列文件只需要大约1分钟。然后,我们使用Delta编码将第二个序列减少到4.8MB。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号