...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Benchmark Dataset for Whole Genome Sequence Compression
【24h】

Benchmark Dataset for Whole Genome Sequence Compression

机译:全基因组序列压缩的基准数据集

获取原文
获取原文并翻译 | 示例
           

摘要

The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. Availability: The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
机译:DNA数据压缩方面的研究缺乏用于测试特定于DNA的压缩工具的标准数据集。本文认为,如果没有这样经过科学编译的全基因组序列数据集,则无法对DNA压缩的当前成就进行基准测试,并提出了使用多阶段采样程序的基准测试数据集。考虑到国家生物技术和信息中心(NCBI)提供的生物的基因组序列,该提议的数据集选择了1,105个原核生物,200个质粒,164个病毒和65个真核生物。本文报告了在新编译的数据集上使用三个已建立工具的结果,并表明只有通过基于科学编译的基准数据集进行比较,它们的优势和劣势才显而易见。可用性:样本数据集和相应的链接可从@https://sourceforge.net/projects/benchmarkdnacompressiondataset/获取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号