首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
【24h】

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

机译:用于核构型相互作用计算的高性能块特征解算器

获取原文
获取原文并翻译 | 示例
           

摘要

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We present techniques to significantly improve the SpMM and the transpose operation SpMM ^T by using the compressed sparse blocks (CSB) format. We achieve 3-4 times speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15times speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4 times to 1.8 times speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.
机译:随着节点上并行性的增加以及处理器与内存系统之间性能差距的扩大,在大规模科学应用中实现高性能需要算法和求解器的体系结构感知设计。我们关注于核构型相互作用(CI)计算中出现的本征值问题,其中需要稀疏对称矩阵的一些极端本征对。我们考虑一个块迭代本征求解器,其主要计算内核是稀疏矩阵与多个向量(SpMM)的相乘以及高瘦矩阵运算。我们提出了通过使用压缩稀疏块(CSB)格式来显着改善SpMM和转置操作SpMM ^ T的技术。在常用的压缩稀疏行(CSR)格式下,通过良好的实现,我们可以在必要的操作上实现3-4倍的加速。我们开发了一个性能模型,该模型使我们能够正确估计SpMM内核实现的性能,并且将高速缓存带宽确定为超越DRAM的潜在性能瓶颈。我们还分析和优化了LOBPCG内核的性能(多个矢量上的内积和线性组合),并显示出比使用高性能BLAS库进行这些操作快15倍的速度。在一系列针对高端多核架构(Intel Xeons)的CI计算中,所产生的高性能LOBPCG求解器的速度比现有Lanczos求解器快1.4到1.8倍。我们还将分析我们的技术在Intel Xeon Phi Knights Corner(KNC)处理器上的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号