首页> 外文会议>British National Conference on Databases(BNCOD 23); 20060718-23; Belfast(GB) >A Heterogeneous Computing System for Data Mining Workflows
【24h】

A Heterogeneous Computing System for Data Mining Workflows

机译:用于数据挖掘工作流的异构计算系统

获取原文
获取原文并翻译 | 示例

摘要

The computing-intensive Data Mining (DM) process calls for the support of a Heterogeneous Computing (HC) system, which consists of multiple computers with different configurations, connected by a high-speed LAN, for increased computational power and resources. DM process can be described as a multi-phase pipeline process, and in each phase there could be many optional methods. This makes the workflow of DM very complex and can be modelled only by a Directed Acyclic Graph (DAG). An HC system needs an effective and efficient scheduling framework, which orchestrates all the computing hardware to perform multiple competitive DM workflows. Motivated by the need of a practical solution of the scheduling problem for the DM workflow, this paper proposes a dynamic DAG scheduling algorithm according to the characteristics of execution time estimation model for DM jobs. Based on an approximate estimation of job execution time, this algorithm first maps DM jobs to machines in a decentralized and diligent (defined in this paper) manner. Then the performance of this initial mapping can be improved through job migrations when necessary. The scheduling heuristic used in it considers the factors of both the minimal completion time criterion and the critical path in a DAG. We implement this system in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems are used to test and measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper.
机译:计算密集型数据挖掘(DM)过程要求支持异构计算(HC)系统,该系统由通过高速LAN连接的具有不同配置的多台计算机组成,以提高计算能力和资源。 DM过程可以描述为多阶段流水线过程,并​​且在每个阶段中可以有许多可选方法。这使得DM的工作流程非常复杂,并且只能通过有向无环图(DAG)进行建模。 HC系统需要有效且高效的调度框架,该框架可协调所有计算硬件以执行多个竞争性DM工作流程。出于对DM工作流调度问题的实际解决方案的需求,本文针对DM作业的执行时间估计模型的特点,提出了一种动态DAG调度算法。基于作业执行时间的近似估计,该算法首先以分散和勤奋的方式(本文定义)将DM作业映射到计算机。然后,可以在必要时通过作业迁移来改善此初始映射的性能。其中使用的调度启发式方法考虑了DAG中最小完成时间标准和关键路径的因素。我们在已建立的Multi-Agent System(MAS)环境中实施此系统,在该环境中,通过将现有的DM算法封装到代理中来实现其重用。实际的分类问题用于测试和衡量系统性能。本文还讨论了详细的实验过程和结果分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号