【24h】

CoGI: Towards Compressing Genomes as an Image

机译:CoGI:将压缩基因组作为图像

获取原文
获取原文并翻译 | 示例
       

摘要

Genomic science is now facing an explosive increase of data thanks to the fast development of sequencing technology. This situation poses serious challenges to genomic data storage and transferring. It is desirable to compress data to reduce storage and transferring cost, and thus to boost data distribution and utilization efficiency. Up to now, a number of algorithms / tools have been developed for compressing genomic sequences. Unlike the existing algorithms, most of which treat genomes as and compress them based on dictionaries or probability models, this paper proposes a novel approach called CoGI (the abbreviation of mpressing enomes as an mage) for genome compression, which transforms the genomic sequences to a (or bitmap), then applies a rectangular partition coding algorithm to compress the binary image. CoGI can be used as either a reference-based compressor or a reference-free compressor. For the former, we develop two entropy-based algorithms to select a proper reference genome. Performance evaluation is conducted on various genomes. Experimental results show that the reference-based CoGI significantly outperforms two state-of-the-art reference-based genome compressors GReEn and RLZ-opt in both compression ratio and compression efficiency. It also achieves comparable compression ratio but two orders of magnitude higher compression efficiency in comparison with XM—one state-of-the-art reference-free genome compressor. Furthermore, our approach performs much better than Gzip—a general-purpose and widely-used compressor, in both compression speed and compression ratio. So, CoGI can serve as an effective and practical genome compressor. The source code and other related documents of CoGI are available at: http://admis.fudan.edu.cn- projects/cogi.htm.
机译:由于测序技术的快速发展,基因组科学现在正面临着爆炸性的数据增长。这种情况对基因组数据的存储和传输提出了严峻的挑战。期望压缩数据以减少存储和传输成本,从而提高数据分配和利用效率。迄今为止,已经开发了许多用于压缩基因组序列的算法/工具。与现有算法不同,大多数算法都将现有的算法视为基因组,并根据字典或概率模型对其进行压缩,本文提出了一种名为CoGI(压制基因的缩写,称为法师)的新颖方法,用于将基因组序列转化为DNA。 (或位图),然后应用矩形分区编码算法来压缩二进制图像。 CoGI可用作基于参考的压缩机或无参考的压缩机。对于前者,我们开发了两种基于熵的算法来选择合适的参考基因组。对各种基因组进行性能评估。实验结果表明,基于参考的CoGI在压缩率和压缩效率方面均明显优于两个基于参考的最新基因组压缩器GReEn和RLZ-opt。与XM(一种​​最先进的无参考基因组压缩器)相比,它还可以实现可比的压缩比,但压缩效率却高出两个数量级。此外,我们的方法在压缩速度和压缩率方面都比Gzip(通用且用途广泛的压缩器)好得多。因此,CoGI可以作为有效而实用的基因组压缩器。 CoGI的源代码和其他相关文档可在以下位置获得:http://admis.fudan.edu.cn- projects / cogi.htm。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号