首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures
【24h】

iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

机译:imlbench:CPU-GPU集成体系结构的机器学习基准套件

获取原文
获取原文并翻译 | 示例
           

摘要

Utilizing heterogeneous accelerators, especially GPUs, to accelerate machine learning tasks has shown to be a great success in recent years. GPUs bring huge performance improvements to machine learning and greatly promote the widespread adoption of machine learning. However, the discrete CPU-GPU architecture design with high PCIe transmission overhead decreases the GPU computing benefits in machine learning training tasks. To overcome such limitations, hardware vendors release CPU-GPU integrated architectures with shared unified memory. In this article, we design a benchmark suite for machine learning training on CPU-GPU integrated architectures, called iMLBench, covering a wide range of machine learning applications and kernels. We mainly explore two features on integrated architectures: 1) zero-copy, which means that the PCIe overhead has been eliminated for machine learning tasks and 2) co-running, which means that the CPU and the GPU co-run together to process a single machine learning task. Our experimental results on iMLBench show that the integrated architecture brings an average 7.1x performance improvement over the original implementations. Specifically, the zero-copy design brings 4.65x performance improvement, and co-running brings 1.78x improvement. Moreover, integrated architectures exhibit promising results from both performance-per-dollar and energy perspectives, achieving 6.50x performance-price ratio while 4.06x energy efficiency over discrete GPUs. The benchmark is open-sourced at https://github.com/ChenyangZhang-cs/iMLBench.
机译:利用异构加速器,尤其是GPU,加速机器学习任务已显示出近年来的巨大成功。 GPU对机器学习带来巨大的性能改进,大大推动了机器学习的广泛采用。然而,具有高PCIe传输开销的离散CPU-GPU架构设计减少了机器学习培训任务中的GPU计算益处。为了克服此类限制,硬件供应商通过共享统一内存释放CPU-GPU集成体系结构。在本文中,我们为CPU-GPU集成架构上的机器学习培训设计了一个基准套件,称为IMLBench,涵盖了各种机器学习应用程序和内核。我们主要探讨集成架构上的两个功能:1)零拷贝,这意味着PCIe开销已被淘汰为机器学习任务和2)共同运行,这意味着CPU和GPU共同运行一起处理a单机学习任务。我们对Imlbench的实验结果表明,整合架构在原始实施方面带来了平均7.1倍的性能改进。具体而言,零拷贝设计带来了4.65倍的性能改进,共同运行带来1.78倍的改进。此外,集成架构表现出具有每美元和能源观点的有希望的结果,实现了6.50x的性能 - 价格比,而在离散GPU上的4.06倍的能效。基准在https://github.com/chenyangzhang -cs/imlbench开放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号