首页> 外文会议>International conference on computational science and its applications >Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs
【24h】

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs

机译:NVIDIA Kepler体系结构GPU上CRS格式的稀疏矩阵-矢量乘法的优化

获取原文

摘要

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific and engineering computing. This paper presents optimization techniques for SpMV for the Compressed Row Storage (CRS) format on NVIDIA Kepler architecture GPUs using CUDA. Our implementation is based on an existing method proposed for the Fermi architecture, an earlier generation of the GPU, and takes advantage of some of the new features of the Kepler architecture. On a Tesla K20 Kepler architecture GPU on double precision operations, our implementation is, on average, approximately 1.29 times faster than that the Fermi optimized implementation for 200 different types of matrices. As a result, our implementation outperforms the NVIDIA cuSPARSE library's CRS format SpMV in CUDA 5.0 on 174 of the 200 matrices, and the average speedup compared to the cuSPARSE SpMV routine across all 200 matrices is approximately 1.45.
机译:稀疏矩阵向量乘法(SpMV)是科学和工程计算中的重要操作。本文介绍了使用CUDA在NVIDIA Kepler体系结构GPU上针对压缩行存储(CRS)格式的SpMV的优化技术。我们的实现基于针对Fermi架构(GPU的较早一代)提出的现有方法,并利用了Kepler架构的一些新功能。在采用双精度运算的Tesla K20开普勒架构GPU上,我们的实现平均比费米针对200种不同类型的矩阵优化的实现快约1.29倍。结果,我们的实现在200个矩阵中的174个上以CUDA 5.0优于NVIDIA cuSPARSE库的CRS格式SpMV,与cuSPARSE SpMV例程相比,在所有200个矩阵中的平均提速约为1.45。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号