【24h】

Partitioning GPUs for Improved Scalability

机译:对GPU进行分区以提高可扩展性

获取原文

摘要

To port applications to GPUs, developers need to express computational tasks as highly parallel executions with tens of thousands of threads to fill the GPU's compute resources. However, while this will fill the GPU's resources, it does not necessarily deliver the best efficiency, as the task may scale poorly when run with sufficient parallelism to fill the GPU. In this work we investigate how we can improve throughput by co-scheduling poorly-scaling tasks on sub-partitions of the GPU to increase utilization efficiency. We first investigate the scalability of typical HPC tasks on GPUs, and then use this insight to improve throughput by extending the StarPU framework to co-schedule tasks on the GPU. We demonstrate that co-scheduling poorly-scaling GPU tasks accelerates the execution of the critical tasks of a Cholesky Factorization and improves the overall performance of the application by 9% across a wide range of block sizes.
机译:为了将应用程序移植到GPU,开发人员需要将计算任务表示为具有数万个线程的高度并行执行,以填充GPU的计算资源。但是,尽管这将填满GPU的资源,但并不一定能提供最佳效率,因为当以足够的并行度运行以填满GPU时,任务的伸缩性可能会很差。在这项工作中,我们研究了如何通过在GPU的子分区上共同调度扩展性差的任务来提高利用率,从而提高吞吐量。我们首先研究GPU上典型HPC任务的可扩展性,然后利用这一见解通过将StarPU框架扩展为在GPU上共同调度任务来提高吞吐量。我们证明了联合调度缩放性差的GPU任务可加快执行Cholesky分解的关键任务,并在各种块大小范围内将应用程序的整体性能提高9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号