首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
【24h】

Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization

机译:使用Cholesky因式分解法在CELL处理器上求解线性方程组

获取原文
获取原文并翻译 | 示例
           

摘要

The Sony/Toshiba/IBM (STI) CELL processor introduces pioneering solutions in processor architecture. At the same time it presents new challenges for the development of numerical algorithms. One is effective exploitation of the differential between the speed of single and double precision arithmetic; the other is efficient parallelization between the short vector SIMD cores. The first challenge is addressed by utilizing the well known technique of iterative refinement for the solution of a dense symmetric positive definite system of linear equations, resulting in a mixed-precision algorithm, which delivers double precision accuracy, while performing the bulk of the work in single precision. The main contribution of this paper lies in addressing the second challenge by successful thread-level parallelization, exploiting fine-grained task granularity and a lightweight decentralized synchronization. The implementation of the computationally intensive sections gets within 90 percent of peak floating point performance, while the implementation of the memory intensive sections reaches within 90 percent of peak memory bandwidth. On a single CELL processor, the algorithm achieves over 170~Gflop/s when solving a symmetric positive definite system of linear equation in single precision and over 150~Gflop/s when delivering the result in double precision accuracy.
机译:索尼/东芝/ IBM(STI)CELL处理器引入了处理器架构方面的开拓性解决方案。同时,它对数值算法的发展提出了新的挑战。一种是有效利用单精度和双精度算术速度之间的差异。另一个是短向量SIMD内核之间的高效并行化。通过使用众所周知的迭代精炼技术解决线性方程组的密集对称正定系统的问题,从而解决了第一个挑战,从而产生了一种混合精度算法,该算法可提供双精度精度,同时执行大量工作。单精度。本文的主要贡献在于通过成功的线程级并行化,利用细粒度的任务粒度和轻量级的分散式同步来解决第二个挑战。计算密集型部分的实现达到峰值浮点性能的90%以内,而内存密集型部分的实现达到峰值内存带宽的90%以内。在单个CELL处理器上,当以单精度求解线性方程组的对称正定系统时,该算法可达到170〜Gflop / s以上;在以双精度精度提供结果时,该算法可达到150〜Gflop / s以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号