Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs

机译：NVIDIA Kepler体系结构GPU上CRS格式的稀疏矩阵-矢量乘法的优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific and engineering computing. This paper presents optimization techniques for SpMV for the Compressed Row Storage (CRS) format on NVIDIA Kepler architecture GPUs using CUDA. Our implementation is based on an existing method proposed for the Fermi architecture, an earlier generation of the GPU, and takes advantage of some of the new features of the Kepler architecture. On a Tesla K20 Kepler architecture GPU on double precision operations, our implementation is, on average, approximately 1.29 times faster than that the Fermi optimized implementation for 200 different types of matrices. As a result, our implementation outperforms the NVIDIA cuSPARSE library's CRS format SpMV in CUDA 5.0 on 174 of the 200 matrices, and the average speedup compared to the cuSPARSE SpMV routine across all 200 matrices is approximately 1.45.

机译：稀疏矩阵向量乘法（SpMV）是科学和工程计算中的重要操作。本文介绍了使用CUDA在NVIDIA Kepler体系结构GPU上针对压缩行存储（CRS）格式的SpMV的优化技术。我们的实现基于针对Fermi架构（GPU的较早一代）提出的现有方法，并利用了Kepler架构的一些新功能。在采用双精度运算的Tesla K20开普勒架构GPU上，我们的实现平均比费米针对200种不同类型的矩阵优化的实现快约1.29倍。结果，我们的实现在200个矩阵中的174个上以CUDA 5.0优于NVIDIA cuSPARSE库的CRS格式SpMV，与cuSPARSE SpMV例程相比，在所有200个矩阵中的平均提速约为1.45。

著录项

来源
《International conference on computational science and its applications》|2013年|211-223|共13页
会议地点
作者
Daichi Mukunoki; Daisuke Takahashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
sparse matrix-vector multiplication; SpMV; Kepler architecture; GPU; CUDA;

机译：稀疏矩阵-向量乘法; SpMV;开普勒建筑; GPU;卡达;

相似文献

外文文献
中文文献
专利

1. An Architecture-aware Technique for Optimizing Sparse Matrix-vector Multiplication on GPUs [J] . Marco Maggioni, Tanya Berger-Wolf Procedia Computer Science . 2013,第1期

机译：一种在GPU上优化稀疏矩阵矢量乘法的体系结构感知技术
2. A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU [J] . Tang Wai Teng, Tan Wen Jun, Goh Rick Siow Mong, Parallel and Distributed Systems, IEEE Transactions on . 2015,第9期

机译：GPU上用于快速稀疏矩阵矢量乘法的一系列位表示优化格式
3. Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs [J] . Ping Guo, Liqiang Wang Concurrency, practice and experience . 2015,第13期

机译：GPU上的稀疏矩阵矢量乘法（SpMV）的准确跨体系结构性能建模
4. Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs [C] . Daichi Mukunoki, Daisuke Takahashi International conference on computational science and its applications . 2013

机译：NVIDIA Kepler架构GPU上CRS格式的稀疏矩阵矢量乘法的优化
5. GPU Optimizing and Accelerating Of Gibbs Ensemble On the CUDA Kepler Architecture [D] . Li, Yuanzhe 2014

机译：CUDA Kepler架构上的Gibbs Ensemble的GPU优化和加速
6. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization [O] . Yuling Fang, Qingkui Chen, Neal N. Xiong, 2017

机译：RGCA：基于有效性能-能源优化的可靠的GPU集群架构用于大规模物联网计算
7. An Architecture-aware Technique for Optimizing Sparse Matrix-vector Multiplication on GPUs [O] . Maggioni Marco, Berger-Wolf Tanya 2013

机译：一种在GPU上优化稀疏矩阵矢量乘法的体系结构感知技术

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs

摘要

著录项

相似文献

相关主题

期刊订阅