首页> 外文会议>International conference on computer design >Static Function Prefetching for Efficient Code Management on Scratchpad Memory
【24h】

Static Function Prefetching for Efficient Code Management on Scratchpad Memory

机译:静态函数预取,可在暂存器上进行有效的代码管理

获取原文

摘要

As cache-based memory hierarchy is becoming a primary factor which limits the scalability and power efficiency of multi-core systems, scratchpad memory (SPM) has been studied as an alternative to cache. When SPM is used as an instruction memory, code management techniques are required to load code blocks on SPM using DMAs. In these techniques, code blocks are generally loaded on-demand to avoid loading incorrect block unlike cache (e.g. tag arrays), SPM does not have mechanism to detect and recover from faults. While on-demand loading guarantees no fault, it leads to considerable performance overhead since it serializes the execution of DMA and CPU. This paper presents a technique to insert prefetching instructions for function-level code management to enable overlapping execution between DMA engine and CPU. Our technique inserts DMA instructions statically at compile time and does not rely on any profiling or run-time resources. Our evaluation shows that static prefetching can reduce CPU idle time due to DMAs by 58.5% and achieves 14.7% of average performance improvement on the benchmarks showing high overhead due to DMAs.
机译:由于基于缓存的内存层次结构正成为限制多核系统的可伸缩性和电源效率的主要因素,因此已研究了暂存器(SPM)作为缓存的替代方案。当SPM用作指令存储器时,需要代码管理技术才能使用DMA在SPM上加载代码块。在这些技术中,代码块通常按需加载,以避免加载与高速缓存(例如,标记阵列)不同的不正确块,SPM没有机制来检测故障并从故障中恢复。尽管按需加载不会造成任何错误,但由于它会串行化DMA和CPU的执行,因此会导致可观的性能开销。本文提出了一种插入预取指令以进行功能级代码管理的技术,以实现DMA引擎和CPU之间的重叠执行。我们的技术在编译时静态地插入DMA指令,并且不依赖任何性能分析或运行时资源。我们的评估表明,静态预取可以将由于DMA而导致的CPU空闲时间减少58.5%,并且在基准测试显示由于DMA导致高开销的情况下,平均性能提高了14.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号