首页> 中文期刊> 《计算机工程与科学》 >基于CUDA编程模型的稀疏对角矩阵向量乘优化

基于CUDA编程模型的稀疏对角矩阵向量乘优化

         

摘要

Sparse matrix-vector multiplication is often an important computational kernel in many scientific applications. This paper faces the n-diagonal sparse matrix, uses the CUDA program model and describes a new compress format of sparse matrix based on the DIA compress format (CDIA), and gives each thread fine-grained task distribution. In order to fulfill the characteristics of the align access of memory in CUDA, we transpose the compress matrix and design a fine-grained algorithm and program and do some optimization to the program. In the data experiment, our best implementation achieves up to 39. 6Gflop/s in single-precision and 19. 6Gflop/s in double-precision, and enhances the performance by about 7. 6% and 17. 4% that of Nathan Bell's and Michael Garland's respectively.%稀疏矩阵向量乘是很多科学计算问题中的核心问题.本文针对稀疏对角矩阵,在DIA存储格式的基础上,设计了一种新型压缩存储格式CDIA,结合CUDA编程模型的特点,在计算线程上进行了细粒度的任务分配,同时为满足CUDA对存储器的合并访问要求,将压缩矩阵做了相应的转置处理,设计了细粒度算法与程序,并根据稀疏矩阵向量乘特点,做了相应的程序优化.实验数据显示,这种存储格式能够很好地发挥CUDA在数据处理方面的优势,在测试数据中,最高获得了单精度39.6 Gflop/s和双精度19.6 Gflop/s的浮点计算性能,性能在Nathan Bell和Michael Garland的基础上分别提高了7.6%和17.4%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号