首页> 外文会议>International Conference on Computer Communication and Networks >LASER: A Deep Learning Approach for Speculative Execution and Replication of Deadline-Critical Jobs in Cloud
【24h】

LASER: A Deep Learning Approach for Speculative Execution and Replication of Deadline-Critical Jobs in Cloud

机译:激光:云中截止日期临界工作的投机执行和复制的深度学习方法

获取原文

摘要

Meeting desired application deadlines is crucial as the nature of cloud applications is becoming increasingly mission-critical and deadline-sensitive. Empirical studies on large-scale clusters reveal that a few slow tasks, known as stragglers, could significantly stretch job execution times. A number of strategies are proposed to mitigate stragglers by launching speculative or clone (task) attempts. These strategies often rely on a model-based approach to optimize key operating parameters and are prone to inaccuracy/incompleteness in the underlying models. In this paper, we present LASER, a deep learning approach for speculative execution and replication of deadline-critical jobs. Machine learning has been successfully used to solve a large variety of classification and prediction problems. In particular, the deep neural network (DNN), consisting of multiple hidden layers of units between input and output layers, can provide more accurate regression (prediction) than traditional machine learning algorithms. We compare LASER with SRQuant, a speculative-resume strategy that is based on quantitative analysis. Both these scheduling algorithms aim to improve Probability of Completion before Deadlines (PoCD), i.e., the probability that MapReduce jobs meet their desired deadlines, and reduce the cost of speculative execution, measured by the total (virtual) machine time. We evaluate and compare the two strategies through testbed experiments. The results show that our two strategies outperform Hadoop without speculation (Hadoop-NS) and Hadoop with speculation (Hadoop-S) by up to 89 % in PoCD and 13% in cost.
机译:满足所需的申请截止日期至关重要,因为云应用的性质正变得越来越多的关键任务和截止日期敏感。大规模集群的实证研究表明,一些慢速任务,称为陷入困境,可以显着拉伸工作时间。提出了许多策略来通过发射推测或克隆(任务)尝试来减轻陷阱。这些策略通常依赖于基于模型的方法来优化关键操作参数,并且易于在底层模型中不准确/不完整。在本文中,我们提出了激光,深入学习方法,用于投机执行和截止日期关键工作的复制。机器学习已成功用于解决大量分类和预测问题。特别地,深神经网络(DNN),由输入和输出层之间的多个单位组成,可以提供比传统机器学习算法更准确的回归(预测)。我们将激光与SQUANT进行比较,这是一种基于定量分析的投机性恢复策略。这些调度算法均旨在提高截止日期前完成的概率(POCD),即MapReduce作业符合其所需截止日期的概率,并降低通过总(虚拟)机器时间测量的推测执行的成本。我们通过测试平台实验评估并比较两种策略。结果表明,我们的两种策略优于Hadoop而没有推测(Hadoop-NS),并在POCD的猜测(Hadoop-S)高达89%和13%的成本上没有猜测(Hadoop-s)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号