首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Space and time efficient parallel algorithms and software for EST clustering
【24h】

Space and time efficient parallel algorithms and software for EST clustering

机译:用于EST集群的时空高效并行算法和软件

获取原文
获取原文并翻译 | 示例
           

摘要

Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. We present the algorithmic foundations and implementation of PaCE, a parallel software system we developed for large-scale EST clustering. The novel features of our approach include 1) design of space-efficient algorithms to limit the space required to linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce runtime and facilitate clustering of large data sets. Using a combination of these techniques, we report the clustering of 327,632 rat ESTs in 47 minutes, and 420,694 Triticum aestivum ESTs in 3 hours and 15 minutes, using a 60-processor IBM xSeries cluster. These problems are well beyond the capabilities of state-of-the-art sequential software. We also present thorough experimental evaluation of our software including quality assessment using benchmark Arabidopsis EST data.
机译:表达的序列标签,简称为EST,是实验上衍生自基因表达部分的DNA分子。 EST的聚类对于基因识别和理解重要的遗传变异(例如导致疾病的变异)至关重要。我们介绍了PaCE的算法基础和实现,PaCE是我们为大规模EST集群开发的并行软件系统。我们方法的新颖特征包括:1)设计节省空间的算法,以将输入数据集的大小限制为线性所需的空间; 2)结合多种算法技术,以在不牺牲EST聚类质量的情况下减少总工作量,以及3)使用并行处理来减少运行时间并促进大型数据集的聚类。使用这些技术的组合,我们使用60个处理器的IBM xSeries集群报告了47分钟内327,632个大鼠EST的聚类,以及3小时和15分钟内420,694个普通小麦EST的聚类。这些问题远远超出了最新的顺序软件的功能。我们还将介绍对我们软件的全面实验评估,包括使用基准拟南芥EST数据进行质量评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号