首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient
【24h】

SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient

机译:SuperMIC:分析具有最大信息系数的生物信息学中的大型生物数据集

获取原文
获取原文并翻译 | 示例
           

摘要

The maximal information coefficient (MIC) has been proposed to discover relationships and associations between pairs of variables. It poses significant challenges for bioinformatics scientists to accelerate the MIC calculation, especially in genome sequencing and biological annotations. In this paper, we explore a parallel approach which uses MapReduce framework to improve the computing efficiency and throughput of the MIC computation. The acceleration system includes biological data storage on HDFS, preprocessing algorithms, distributed memory cache mechanism, and the partition of MapReduce jobs. Based on the acceleration approach, we extend the traditional two-variable algorithm to multiple variables algorithm. The experimental results show that our parallel solution provides a linear speedup comparing with original algorithm without affecting the correctness and sensitivity.
机译:已提出最大信息系数(MIC)以发现变量对之间的关​​系和关联。对于生物信息学科学家来说,尤其是在基因组测序和生物学注释方面,加速MIC计算将面临巨大挑战。在本文中,我们探索了一种并行方法,该方法使用MapReduce框架来提高MIC计算的计算效率和吞吐量。该加速系统包括HDFS上的生物数据存储,预处理算法,分布式内存缓存机制以及MapReduce作业的分区。基于加速方法,我们将传统的二变量算法扩展为多变量算法。实验结果表明,与原始算法相比,我们的并行解决方案提供了线性加速,并且不影响正确性和灵敏度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号