Partitioning GPUs for Improved Scalability

机译：对GPU进行分区以提高可扩展性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.

机译：为了将应用程序移植到GPU，开发人员需要将计算任务表示为具有数万个线程的高度并行执行，以填充GPU的计算资源。但是，尽管这将填满GPU的资源，但并不一定能提供最佳效率，因为当以足够的并行度运行以填满GPU时，任务的伸缩性可能会很差。在这项工作中，我们研究了如何通过在GPU的子分区上共同调度扩展性差的任务来提高利用率，从而提高吞吐量。我们首先研究GPU上典型HPC任务的可扩展性，然后利用这一见解通过将StarPU框架扩展为在GPU上共同调度任务来提高吞吐量。我们证明了联合调度缩放性差的GPU任务可加快执行Cholesky分解的关键任务，并在各种块大小范围内将应用程序的整体性能提高9％。

著录项

来源
《IEEE International Symposium on Computer Architecture and High Performance Computing》|2016年|42-49|共8页
会议地点
作者
Johan Janzén; David Black-Schaffer; Andra Hugo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Kernel; Graphics processing units; Scalability; Throughput; Runtime; Parallel processing; Instruction sets;

机译：内核;图形处理单元;可扩展性;吞吐量;运行时;并行处理;指令集;

相似文献

外文文献
中文文献
专利

1. GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs [J] . Wang Yuzhu, Guo Mingxin, Zhao Yuan, Journal of supercomputing . 2021,第5期

机译：GPUS-RRTMG_LW：用于多个GPU上的长波辐射传输模型的高效和可伸缩计算
2. Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs [J] . Yang Wangdong, Li Kenli, Mo Zeyao, Computers, IEEE Transactions on . 2015,第9期

机译：在GPU和多核CPU上使用分区SpMV进行性能优化
3. Accelerating calculations of RNA secondary structure partition functions using GPUs [J] . Harry A Stern, David H Mathews Algorithms for Molecular Biology . 2013,第1期

机译：使用GPU加速RNA二级结构分区功能的计算
4. Partitioning GPUs for Improved Scalability [C] . Johan Janzén, David Black-Schaffer, Andra Hugo International Symposium on Computer Architecture and High Performance Computing . 2016

机译：划分GPU以提高可扩展性
5. Experiment-based validation and uncertainty quantification of partitioned models: Improving predictive capability of multi-scale plasticity models. [D] . Stevens, Garrison N. 2016

机译：基于实验的验证和分区模型的不确定性量化：提高多尺度可塑性模型的预测能力。
6. Accelerating calculations of RNA secondary structure partition functions using GPUs [O] . Harry A Stern, David H Mathews 2013

机译：使用GPU加速RNA二级结构分区功能的计算
7. Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs [O] . Michela Taufer, Omar Padron, Philip Saponaro, 2010

机译：提高GpU上大规模数值模拟的数值再现性和稳定性

Partitioning GPUs for Improved Scalability

摘要

著录项

相似文献

相关主题

期刊订阅