首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Data Management for Heterogeneous Genomic Datasets
【24h】

Data Management for Heterogeneous Genomic Datasets

机译:异构基因组数据集的数据管理

获取原文
获取原文并翻译 | 示例
           

摘要

Next Generation Sequencing (NGS), a family of technologies for reading DNA and RNA, is changing biological research, and will soon change medical practice, by quickly providing sequencing data and high-level features of numerous individual genomes in different biological and clinical conditions. The availability of millions of whole genome sequences may soon become the biggest and most important ”big data” problem of mankind. In this exciting framework, we recently proposed a new paradigm to raise the level of abstraction in NGS data management, by introducing a GenoMetric Query Language (GMQL) and demonstrating its usefulness through several biological query examples. Leveraging on that effort, here we motivate and formalize GMQL operations, especially focusing on the most characteristic and domain-specific ones. Furthermore, we address their efficient implementation and illustrate the architecture of the new software system that we have developed for their execution on big genomic data in a cloud computing environment, providing the evaluation of its performance. The new system implementation is available for download at the GMQL website (http://www.bioinformatics.deib.polimi.it/GMQL/); GMQL can also be tested through a set of predefined queries on ENCODE and Roadmap Epigenomics data at http://www.bioinformatics.deib.polimi.it/GMQL/queries/.
机译:下一代测序(NGS)是一种读取DNA和RNA的技术,它正在改变生物学研究,并将通过快速提供测序数据和不同生物学和临床条件下众多个体基因组的高级特征,来改变医学实践。数以百万计的全基因组序列的可用性可能很快将成为人类最大和最重要的“大数据”问题。在这个令人兴奋的框架中,我们最近提出了一种新的范例,通过引入基因计量查询语言(GMQL)并通过几个生物学查询示例来证明其有用性,从而提高NGS数据管理的抽象水平。利用这一努力,我们在这里激励并正式化GMQL操作,尤其是专注于最具特色和针对特定领域的操作。此外,我们将解决这些问题的有效实施,并说明我们开发的新软件系统的体系结构,以便在云计算环境中的大型基因组数据上执行这些新软件系统,并对其性能进行评估。可在GMQL网站(http://www.bioinformatics.deib.polimi.it/GMQL/)上下载新的系统实现; GMQL也可以通过在http://www.bioinformatics.deib.polimi.it/GMQL/queries/上对ENCODE和Roadmap Epigenomics数据进行一组预定义的查询来进行测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号