A Performance Modeling and OptimizationAnalysis Tool for Sparse Matrix-VectorMultiplication on GPUs

Guo P.; Wang L.; Chen P.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Performance Modeling and OptimizationAnalysis Tool for Sparse Matrix-VectorMultiplication on GPUs

【24h】

A Performance Modeling and OptimizationAnalysis Tool for Sparse Matrix-VectorMultiplication on GPUs

机译：GPU上稀疏矩阵-矢量乘法的性能建模和优化分析工具

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a performance modeling and optimization analysis tool to predict and optimize the performance of sparse matrix-vector multiplication (SpMV) on GPUs. We make the following contributions: 1) We present an integrated analytical and profile-based performance modeling to accurately predict the kernel execution times of CSR, ELL, COO, and HYB SpMV kernels. Our proposed approach is general, and neither limited by GPU programming languages nor restricted to specific GPU architectures. In this paper, we use CUDA-based SpMV kernels and NVIDIA Tesla C2050 for our performance modeling and experiments. According to our experiments, for 77 out of 82 test cases, the performance differences between the predicted and measured execution times are less than 9 percent; for the rest five test cases, the differences are between 9 and 10 percent. For CSR, ELL, COO, and HYB SpMV CUDA kernels, the average differences are 6.3, 4.4, 2.2, and 4.7 percent, respectively. 2) Based on the performance modeling, we design a dynamic-programming based SpMV optimal solution auto-selection algorithm to automatically report an optimal solution (i.e., optimal storage strategy, storage format(s), and execution time) for a target sparse matrix. In our experiments, the average performance improvements of the optimal solutions are 41.1, 49.8, and 37.9 percent, compared to NVIDIA’s CSR, COO, and HYB CUDA kernels, respectively.

机译：本文介绍了一种性能建模和优化分析工具，用于预测和优化GPU上的稀疏矩阵矢量乘法（SpMV）的性能。我们做出以下贡献：1）我们提出了一个基于分析和基于配置文件的集成性能模型，以准确预测CSR，ELL，COO和HYB SpMV内核的内核执行时间。我们提出的方法是通用的，既不受GPU编程语言的限制，也不受特定GPU架构的限制。在本文中，我们使用基于CUDA的SpMV内核和NVIDIA Tesla C2050进行性能建模和实验。根据我们的实验，对于82个测试用例中的77个，预测执行时间和测量执行时间之间的性能差异小于9％；对于其余五个测试用例，差异在9％到10％之间。对于CSR，ELL，COO和HYB SpMV CUDA内核，平均差异分别为6.3％，4.4％，2.2％和4.7％。 2）基于性能建模，我们设计了基于动态编程的SpMV最佳解决方案自动选择算法，以自动报告目标稀疏矩阵的最佳解决方案（即最佳存储策略，存储格式和执行时间）。在我们的实验中，与NVIDIA的CSR，COO和HYB CUDA内核相比，最佳解决方案的平均性能改进分别为41.1％，49.8％和37.9％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2014年第5期|1112-1123|共12页
作者
Guo P.; Wang L.; Chen P.;
展开▼
作者单位

Department of Computer Science, Department 3315, 1000 E. University Ave., University of Wyoming, Laramie;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; GPU; Performance modeling; sparse matrix-vector multiplication;

机译：CUDA;GPU;性能建模;稀疏矩阵向量乘法;

相似文献

外文文献
中文文献
专利

1. Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs [J] . Ping Guo, Liqiang Wang Concurrency, practice and experience . 2015,第13期

机译：GPU上的稀疏矩阵矢量乘法（SpMV）的准确跨体系结构性能建模
2. GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling [J] . Xiaoming Chen, Ling Ren, Yu Wang, Parallel and Distributed Systems, IEEE Transactions on . 2015,第3期

机译：具有性能建模的GPU加速的稀疏LU分解用于电路仿真
3. SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs [J] . Li Kaiwei, Chen Jianfei, Chen Wenguang, IEEE Transactions on Parallel and Distributed Systems . 2020,第9期

机译：Saberlda：稀疏感知GPU上主题模型的学习
4. Acceleration of Sparse Vector Autoregressive Modeling Using GPUs [C] . Shreenivas Bharadwaj Venkataramanan, Rahul Garg, Yogish Sabharwal International Conference on High Performance Computing, Data, and Analytics . 2019

机译：使用GPU加速稀疏向量自回归建模
5. Performance and Power Analytical Models for GPUs and Mobile Devices. [D] . Issa, Joseph. 2012

机译：GPU和移动设备的性能和功耗分析模型。
6. Accuracy and Performance of Functional Parameter Estimation Using a Novel Numerical Optimization Approach for GPU-Based Kinetic Compartmental Modeling [O] . Igor Svistoun, Brandon Driscoll, Catherine Coolens 2019

机译：基于GPU的动力学隔室建模的新型数值优化方法估计功能参数的准确性和性能
7. Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs [O] . Ping Guo, Liqiang Wang 2014

机译：GPU上稀疏矩阵 - 矢量乘法（SPMV）的准确交叉架构性能建模

A Performance Modeling and OptimizationAnalysis Tool for Sparse Matrix-VectorMultiplication on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅