首页> 中文期刊> 《中国科技论文》 >基于 MapReduce 的模体发现算法

基于 MapReduce 的模体发现算法

         

摘要

Motif search plays an important role in gene finding and understanding gene regulation relationship, and is one of the most challenging problems in bioinformatics. This paper presents three data partitioning methods for the PMSP algorithm and proposes the PMSPMapReduce algorithm (PMSPMR) for solving motif search problems. For problems of varying difficulty, the experimental results on the Hadoop cluster demonstrate that PMSPMR has good scalability. In particular, for motif search problems with high levels of difficulty, PMSPMR shows its advantage because the speedup is almost linearly proportional to the number of nodes in the Hadoop cluster. This paper also presents experimental results on realistic biological data by identifying known transcriptional regulatory motifs in eukaryotesaswellasinactual promotersequencesextractedfrom Saccharomycescerevisiae.%  模体发现对于基因发现和理解基因调控关系有着重要的意义,它是生物信息学中最具挑战性的问题之一。提出了针对PMSP算法的3种数据划分方法,并在此基础上提出了基于MapReduce的模体发现算法(PMSPMR)。针对不同难度的问题,在Hadoop集群上的实验结果表明,PMSPMR算法具有良好的可扩展性。特别地,对于难度较大的模体发现问题实例,PMSPMR算法的加速比接近于Hadoop集群中节点的数目。此外,对于真实数据的实验,PMSPMR算法能够识别出真核细胞和酿酒酵母中已知的转录调控模体,表明了算法的有效性

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号