首页> 外文会议>International conference on parallel and distributed computing >Exploiting Data Sparsity for Large-Scale Matrix Computations
【24h】

Exploiting Data Sparsity for Large-Scale Matrix Computations

机译:利用数据稀疏性进行大规模矩阵计算

获取原文

摘要

Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. In this work, we leverage the Hierarchical matrix Computations on Manycore Architectures (HiCMA) library in order to tackle this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. We have extended HiCMA to provide a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.
机译:在密集矩阵中利用数据稀疏性是在每核基础上越来越多的内存节省的架构与极端规模的应用程序之间的算法桥梁。在这项工作中,我们利用Manycore体系结构上的分层矩阵计算(HiCMA)库来解决此难题,方法是显着减少解决方案和内存占用的时间,同时保留应用程序的特定精度要求。我们扩展了HiCMA,以在分布式内存系统上提供高性能实现,该系统是大规模科学应用中最广泛使用的矩阵分解之一,即Cholesky分解。它采用图块低秩数据格式来压缩矩阵的密集数据稀疏非对角图块。然后,它将矩阵计算分解为相互依赖的任务,并依靠动态运行时系统StarPU进行异步无序调度,同时提高了用户生产率。相对于最新的开放源代码和供应商优化的数值库,在高达一千一百万的矩阵尺寸上的性能比较和内存占用量显示,数千个内核上的两个指标的性能提升和内存节省都超过了一个数量级。 。这是实现大规模矩阵计算以解决气候/天气预报应用的地理空间统计中的大数据问题的重要里程碑。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号