首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Cacheminer: A runtime approach to exploit cache locality on SMP
【24h】

Cacheminer: A runtime approach to exploit cache locality on SMP

机译:Cacheminer:一种在SMP上利用缓存局部性的运行时方法

获取原文
获取原文并翻译 | 示例
           

摘要

Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-presented way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimCS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations.
机译:在运行时利用并行程序的缓存局部性是编译器优化的一种补充方法。这对于具有动态内存访问模式的应用程序尤其重要。我们提出一种面向内存布局的技术,以在对称多处理器(SMP)系统上在运行时利用并行循环的缓存局部性。在依赖于应用程序和基于目标架构的提示的指导下,我们的系统称为Cacheminer,它使用执行的内存访问空间来重组和划分并行循环。通过有效的运行时转换,我们的系统最大程度地提高了缓存中分配的每个分区数据区域中的数据重用,并最大程度地减少了分配给所有缓存的分区数据区域之间的数据共享。分区中任务的执行以适应性和局部性的方式安排,以通过权衡负载平衡和局部性来最小化程序的执行时间。我们已经在两个商业SMP服务器和一个SimCS模拟的SMP上实现了Cacheminer运行时库。我们的仿真和测量结果表明,对于具有常规计算和内存访问模式的程序,我们的运行时方法可以达到与编译器优化相当的性能,可以通过平铺和其他程序转换很好地优化其负载平衡和缓存位置。但是,我们的实验结果表明,我们的方法能够显着提高具有不规则计算和动态内存访问模式的应用程序的内存性能。这些类型的程序通常很难通过静态编译器优化来优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号