首页> 外文期刊>Database >Damming the genomic data flood using a comprehensive analysis and storage data structure
【24h】

Damming the genomic data flood using a comprehensive analysis and storage data structure

机译:使用全面的分析和存储数据结构来阻止基因组数据泛滥

获取原文
           

摘要

Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca
机译:在基因组技术迅速发展的推动下,数据生成正迅速超过我们的分析能力。面对大量数据,需要添加更多的硬件和软件资源来容纳其结构尚未专门设计用于分析的数据集。这导致不必要的冗长的处理时间以及过多的数据处理和存储成本。当前解决这个问题的努力集中在开发新的索引模式和分析算法上,而问题的根源在于数据本身的格式。我们已经开发了一种用于存储和分析基因型和表型数据的新数据结构。通过利用数据规范化技术,数据库管理系统功能以及使用新颖的多表多维数据库结构,我们消除了以下问题:(i)由于高冗余度而导致不必要的大数据集大小,(ii)顺序访问这些数据集和(iii)分析时间中的常见瓶颈。产生的新颖数据结构将数据水平划分,以规避与使用数据库处理大型基因组数据集相关的传统问题。与标准方法相比,生成的数据集所需的磁盘空间减少了86%,执行分析计算的速度比标准方法快了6248倍,而不会丢失任何信息。数据库网址:http://castor.pharmacogenomics.ca

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号