A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

Hasan Metin Aktulga; Md. Afibuzzaman; Samuel Williams; Aydın Buluç; Meiyue Shao; Chao Yang; Esmond G. Ng; Pieter Maris; James P. Vary

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

【24h】

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

机译：用于核构型相互作用计算的高性能块特征解算器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We present techniques to significantly improve the SpMM and the transpose operation SpMM ^T by using the compressed sparse blocks (CSB) format. We achieve 3-4 times speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15times speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4 times to 1.8 times speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.

机译：随着节点上并行性的增加以及处理器与内存系统之间性能差距的扩大，在大规模科学应用中实现高性能需要算法和求解器的体系结构感知设计。我们关注于核构型相互作用（CI）计算中出现的本征值问题，其中需要稀疏对称矩阵的一些极端本征对。我们考虑一个块迭代本征求解器，其主要计算内核是稀疏矩阵与多个向量（SpMM）的相乘以及高瘦矩阵运算。我们提出了通过使用压缩稀疏块（CSB）格式来显着改善SpMM和转置操作SpMM ^ T的技术。在常用的压缩稀疏行（CSR）格式下，通过良好的实现，我们可以在必要的操作上实现3-4倍的加速。我们开发了一个性能模型，该模型使我们能够正确估计SpMM内核实现的性能，并且将高速缓存带宽确定为超越DRAM的潜在性能瓶颈。我们还分析和优化了LOBPCG内核的性能（多个矢量上的内积和线性组合），并显示出比使用高性能BLAS库进行这些操作快15倍的速度。在一系列针对高端多核架构（Intel Xeons）的CI计算中，所产生的高性能LOBPCG求解器的速度比现有Lanczos求解器快1.4到1.8倍。我们还将分析我们的技术在Intel Xeon Phi Knights Corner（KNC）处理器上的性能。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第6期|1550-1563|共14页
作者
Hasan Metin Aktulga; Md. Afibuzzaman; Samuel Williams; Aydın Buluç; Meiyue Shao; Chao Yang; Esmond G. Ng; Pieter Maris; James P. Vary;
展开▼
作者单位

Michigan State University, 428 S. Shaw Lane, Room 3115, East Lansing, MI;

Michigan State University, 428 S. Shaw Lane, Room 3115, East Lansing, MI;

Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA;

Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA;

Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA;

Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA;

Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA;

Department of Physics and Astronomy, Iowa State University, Ames, IA;

Department of Physics and Astronomy, Iowa State University, Ames, IA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrices; Eigenvalues and eigenfunctions; Symmetric matrices; Kernel; Wave functions; Bandwidth; Computer architecture;

机译：稀疏矩阵;特征值和特征函数;对称矩阵;内核;波动函数;带宽;计算机体系结构;

相似文献

外文文献
中文文献
专利

1. Accelerating nuclear configuration interaction calculations through a preconditioned block iterative eigensolver [J] . Meiyue Shao, H.?Metin Aktulga, Chao Yang, Computer physics communications . 2018,第期

机译：通过预处理的封锁迭代Eigensolver加速核配置相互作用计算
2. Calculations of neon nuclear-spin optical rotation, Verdet and hyperfine constants with configuration-interaction many-body perturbation theory [J] . Savukov Igor, Filin Dmytro, Zhu Yue, The European physical journal, D. Atomic, molecular, and optical physics . 2019,第7期

机译：氖核 - 旋转光学旋转，脉冲和高血清常数的计算 - 相互作用许多身体扰动理论
3. Nuclear Structure Calculations in 20Ne with No-Core Configuration–Interaction Model [J] . M. Konieczka, W. Satu?a Acta physica Polonica, B. Particle Physics and Field Theory, Nuclear Physics, Theory of Relativity . 2017,第3期

机译：无核构型-相互作用模型的20Ne核结构计算
4. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations [C] . Aktulga Hasan Metin, Buluc Aydin, Williams Samuel, IEEE International Parallel Distributed Processing Symposium . 2014

机译：优化稀疏矩阵-多个向量乘法以进行核构型相互作用计算
5. Study of Local Environment and Nuclear Interactions in Magnesium and Sulfur Containing Materials by Magnesium-25 and Sulfur-33 Solid-State Nuclear Magnetic Resonance Spectroscopy and First-Principles Calculations . [D] . Pallister, Peter J. 2010

机译：用镁25和硫33固态核磁共振谱和第一性原理计算研究镁和含硫材料中的局部环境和核相互作用。
6. Configuration Interaction in the Calculation of Oscillatory and Rotatory Intensities of Nonplanar π-Electronic Systems [O] . Kam-Khow Cheong, Allen Oshita, Dennis J. Caldwell, 1970

机译：非平面π电子系统的振动和旋转强度计算中的结构相互作用
7. Accelerating Nuclear Configuration Interaction Calculations through a Preconditioned Block Iterative Eigensolver [O] . Shao, Meiyue, Aktulga, Hasan Metin, Yang, Chao, 2017

机译：通过一个加速核配置交互计算预处理块迭代Eigensolver

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

摘要

著录项

相似文献

相关主题

期刊订阅