Benchmark Dataset for Whole Genome Sequence Compression

Biji C. L.; Achuthsankar S. Nair

首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Benchmark Dataset for Whole Genome Sequence Compression

【24h】

Benchmark Dataset for Whole Genome Sequence Compression

机译：全基因组序列压缩的基准数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. Availability: The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

机译：DNA数据压缩方面的研究缺乏用于测试特定于DNA的压缩工具的标准数据集。本文认为，如果没有这样经过科学编译的全基因组序列数据集，则无法对DNA压缩的当前成就进行基准测试，并提出了使用多阶段采样程序的基准测试数据集。考虑到国家生物技术和信息中心（NCBI）提供的生物的基因组序列，该提议的数据集选择了1,105个原核生物，200个质粒，164个病毒和65个真核生物。本文报告了在新编译的数据集上使用三个已建立工具的结果，并表明只有通过基于科学编译的基准数据集进行比较，它们的优势和劣势才显而易见。可用性：样本数据集和相应的链接可从@https：//sourceforge.net/projects/benchmarkdnacompressiondataset/获取。

著录项

来源
《IEEE/ACM transactions on computational biology and bioinformatics》 |2017年第6期|1228-1236|共9页
作者
Biji C. L.; Achuthsankar S. Nair;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bioinformatics; Genomics; DNA; Sociology; Statistics; Benchmark testing; Encoding; Sequential analysis;

机译：生物信息学;基因组学;DNA;社会学;统计;基准测试;编码;顺序分析;

相似文献

外文文献
中文文献
专利

1. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia [J] . Williams Anna V., Miller Joseph T., Small Ian, Molecular phylogenetics and evolution . 2016,第Null期

机译：完整的叶绿体基因组序列与小的扩增子数据集的整合提高了相思树的系统发育分辨率
2. Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences [J] . Kryukov Kirill, Ueda Mahoko Takahashi, Nakagawa So, GigaScience . 2020,第7期

机译：序列压缩基准（SCB）数据库 - 对Fasta格式化序列的无参考压缩机的综合评估
3. Genomic sequence assembly for the malaria-like parasite Hepatocystis using primate whole genome sequencing datasets [J] . Trujillo Amber E., Chaney Morgan E., Bergey Christina M. American Journal of Physical Anthropology . 2020,第S69期

机译：使用灵长类动物全基因组测序数据集的疟疾寄生虫肝细胞的基因组序列组件
4. Genome Sequences as Media Files Towards Effective, Efficient, and functional Compression of Genomic Data [C] . Tom Paridaens, Wesley De Neve, Peter Lambert, International Joint Conference on Biomedical Engineering Systems and Technologies . 2014

机译：基因组序列作为媒体文件朝向基因组数据的有效，高效和功能压缩
5. Optimizing DCT-Based Lossy Compression for Scientific Datasets [D] . ?Chen, Jiaxi 2020

机译：优化基于DCT的科学数据集的损耗压缩
6. Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences [O] . Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, 2020

机译：序列压缩基准（SCB）数据库-全面评估FASTA格式序列的无参考压缩器
7. Whole Genome and Exome Sequencing Reference Datasets from A Multi-center and Cross-platform Benchmark Study [O] . Yongmei Zhao, Li Tai Fang, Tsai-wei Shen, 2021

机译：来自多中心和跨平台基准研究的全基因组和外壳测序参考数据集

Benchmark Dataset for Whole Genome Sequence Compression

摘要

著录项

相似文献

相关主题

期刊订阅