首页> 外文会议>International conference on parallel processing and applied mathematics >NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch
【24h】

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch

机译:NVIDIA GPU可扩展性,可解决cuThomasBatch的多个(批)三对角系统实现

获取原文

摘要

The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies have mainly focused on using parallel algorithms to compute such systems, which can efficiently exploit the shared memory and are able to saturate the GPUs capacity with a low number of systems, presenting a poor scalability when dealing with a relatively high number of systems. We propose a new implementation (cuThomasBatch) based on the Thomas algorithm. To achieve a good scalability using this approach is necessary to carry out a transformation in the way that the inputs are stored in memory to exploit coalescence (contiguous threads access to contiguous memory locations). The results given in this study proves that the implementation carried out in this work is able to beat the reference code when dealing with a relatively large number of Tridiagonal systems (2,000-256,000), being closed to 3x (in double precision) and 4× (in single precision) faster using one Kepler NVIDIA GPU.
机译:三对角线系统的求解是许多应用程序中计算最昂贵的部分之一,因此,多项研究探索了使用NVIDIA GPU来加速这种计算的过程。但是,这些研究主要集中在使用并行算法来计算这样的系统,这些系统可以有效地利用共享内存,并且能够以较少数量的系统饱和GPU的容量,而在处理相对大量的系统时却表现出较差的可扩展性。系统。我们提出了一种基于Thomas算法的新实现(cuThomasBatch)。为了获得良好的可伸缩性,必须使用此方法来进行转换,以将输入存储在内存中的方式来利用合并(连续线程访问连续内存位置)。这项研究给出的结果证明,当处理相对大量的Tridiagonal系统(2,000-256,000),接近3倍(双精度)和4倍的Tridiagonal系统时,本工作中执行的实现能够击败参考代码。使用一个Kepler NVIDIA GPU更快(单精度)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号