iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

Zhang Chenyang; Zhang Feng; Guo Xiaoguang; He Bingsheng; Zhang Xiao; Du Xiaoyong

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

【24h】

iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

机译：imlbench：CPU-GPU集成体系结构的机器学习基准套件

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Utilizing heterogeneous accelerators, especially GPUs, to accelerate machine learning tasks has shown to be a great success in recent years. GPUs bring huge performance improvements to machine learning and greatly promote the widespread adoption of machine learning. However, the discrete CPU-GPU architecture design with high PCIe transmission overhead decreases the GPU computing benefits in machine learning training tasks. To overcome such limitations, hardware vendors release CPU-GPU integrated architectures with shared unified memory. In this article, we design a benchmark suite for machine learning training on CPU-GPU integrated architectures, called iMLBench, covering a wide range of machine learning applications and kernels. We mainly explore two features on integrated architectures: 1) zero-copy, which means that the PCIe overhead has been eliminated for machine learning tasks and 2) co-running, which means that the CPU and the GPU co-run together to process a single machine learning task. Our experimental results on iMLBench show that the integrated architecture brings an average 7.1x performance improvement over the original implementations. Specifically, the zero-copy design brings 4.65x performance improvement, and co-running brings 1.78x improvement. Moreover, integrated architectures exhibit promising results from both performance-per-dollar and energy perspectives, achieving 6.50x performance-price ratio while 4.06x energy efficiency over discrete GPUs. The benchmark is open-sourced at https://github.com/ChenyangZhang-cs/iMLBench.

机译：利用异构加速器，尤其是GPU，加速机器学习任务已显示出近年来的巨大成功。 GPU对机器学习带来巨大的性能改进，大大推动了机器学习的广泛采用。然而，具有高PCIe传输开销的离散CPU-GPU架构设计减少了机器学习培训任务中的GPU计算益处。为了克服此类限制，硬件供应商通过共享统一内存释放CPU-GPU集成体系结构。在本文中，我们为CPU-GPU集成架构上的机器学习培训设计了一个基准套件，称为IMLBench，涵盖了各种机器学习应用程序和内核。我们主要探讨集成架构上的两个功能：1）零拷贝，这意味着PCIe开销已被淘汰为机器学习任务和2）共同运行，这意味着CPU和GPU共同运行一起处理a单机学习任务。我们对Imlbench的实验结果表明，整合架构在原始实施方面带来了平均7.1倍的性能改进。具体而言，零拷贝设计带来了4.65倍的性能改进，共同运行带来1.78倍的改进。此外，集成架构表现出具有每美元和能源观点的有希望的结果，实现了6.50x的性能 - 价格比，而在离散GPU上的4.06倍的能效。基准在https://github.com/chenyangzhang -cs/imlbench开放。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第7期|1740-1752|共13页
作者
Zhang Chenyang; Zhang Feng; Guo Xiaoguang; He Bingsheng; Zhang Xiao; Du Xiaoyong;
展开▼
作者单位

Renmin Univ China Key Lab Data Engn & Knowledge Engn MOE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Renmin Univ China Key Lab Data Engn & Knowledge Engn MOE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Renmin Univ China Key Lab Data Engn & Knowledge Engn MOE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Natl Univ Singapore Sch Comp Singapore 119077 Singapore;

Renmin Univ China Key Lab Data Engn & Knowledge Engn MOE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

Renmin Univ China Key Lab Data Engn & Knowledge Engn MOE Beijing 100872 Peoples R China|Renmin Univ China Sch Informat Beijing 100872 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Computer architecture; Machine learning; Benchmark testing; Graphics processing units; Task analysis; Hardware; Training; Machine learning; benchmark; CPU; GPU; integrated architectures;

机译：计算机建筑;基准测试;图形处理单元;任务分析;硬件;培训;机器学习;基准;CPU;GPU;集成架构;集成架构;

相似文献

外文文献
中文文献
专利

1. MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance [J] . Mattson Peter, Tang Hanlin, Wei Gu-Yeon, IEEE Micro . 2020,第2期

机译：MLPERF：机器学习性能的行业标准基准套件
2. Market Model Benchmark Suite for Machine Learning Techniques [J] . Martin Prause, Jurgen Weigand Computational Intelligence Magazine, IEEE . 2018,第4期

机译：机器学习技术的市场模型基准套件
3. PMLB: a large benchmark suite for machine learning evaluation and comparison [J] . Randal S. Olson, William La Cava, Patryk Orzechowski, BioData Mining . 2017,第1期

机译：PMLB：用于机器学习评估和比较的大型基准套件
4. VGM-Bench: FPU Benchmark Suite for Computer Vision, Computer Graphics and Machine Learning Applications [C] . Luca Cremona, William Fornaciari, Andrea Galimberti, International conference on embedded computer systems: architectures, modeling and simulation . 2020

机译：VGM-Bench：FPU电脑视觉，计算机图形和机器学习应用的基准套件
5. Benchmarking Statistical and Machine-Learning Methods for Single-Cell RNA Sequencing Data [D] . Xi, Nan. 2021

机译：用于单细胞RNA测序数据的基准测试统计和机器学习方法
6. PMLB: a large benchmark suite for machine learning evaluation and comparison [O] . Randal S. Olson, William La Cava, Patryk Orzechowski, 2017

机译：PMLB：用于机器学习评估和比较的大型基准套件
7. PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison [O] . Olson, Randal S., La Cava, William, Orzechowski, Patryk, 2017

机译：pmLB：用于机器学习评估和评估的大型基准套件对照

iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures

摘要

著录项

相似文献

相关主题

期刊订阅