首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >Building a Low Latency, Highly Associative DRAM Cache with the Buffered Way Predictor
【24h】

Building a Low Latency, Highly Associative DRAM Cache with the Buffered Way Predictor

机译:使用缓冲方式预测器构建低延迟,高度关联的DRAM缓存

获取原文

摘要

The emerging die-stacked DRAM technology allows computer architects to design a last-level cache (LLC) with high memory bandwidth and large capacity. There are four key requirements for DRAM cache design: minimizing on-chip tag storage overhead, optimizing access latency, improving hit rate, and reducing off-chip traffic. These requirements seem mutually incompatible. For example, to reduce the tag storage overhead, the recent proposed LH-cache co-locates tags and data in the same DRAM cache row, and the Alloy Cache proposed to alloy data and tags in the same cache line in a direct-mapped design. However, these ideas either require significant tag lookup latency or sacrifice hit rate for hit latency. To optimize all four key requirements, we propose the Buffered Way Predictor (BWP). The BWP predicts the way ID of a DRAM cache request with high accuracy and coverage, allowing data and tag to be fetched back to back. Thus, the read latency for the data can be completely hidden so that DRAM cache hitting requests have low access latency. The BWP technique is designed for highly associative block-based DRAM caches and achieves a low miss rate and low off-chip traffic. Our evaluation with multi-programmed workloads and a 128MB DRAM cache shows that a 128KB BWP achieves a 76.2% hit rate. The BWP improves performance by 8.8% and 12.3% compared to LH-cache and Alloy Cache, respectively.
机译:新兴的裸片堆叠DRAM技术使计算机架构师可以设计具有高内存带宽和大容量的最后一级缓存(LLC)。 DRAM缓存设计有四个关键要求:最小化片上标签存储开销,优化访问延迟,提高命中率以及减少片外流量。这些要求似乎相互不兼容。例如,为了减少标签存储开销,最近提出的LH高速缓存将标签和数据共存于同一DRAM高速缓存行中,而Alloy Cache建议在直接映射设计中将同一高速缓存行中的数据和标签合在一起。然而,这些想法要么需要大量的标签查找等待时间,要么为命中等待时间牺牲命中率。为了优化所有四个关键要求,我们提出了缓冲路径预测器(BWP)。 BWP以很高的准确性和覆盖范围预测DRAM缓存请求的方式,从而允许数据和标签被背靠背获取。因此,可以完全隐藏数据的读取延迟,从而使DRAM缓存命中请求具有较低的访问延迟。 BWP技术设计用于高度关联的基于块的DRAM高速缓存,并实现了较低的未命中率和较低的片外流量。我们对多程序工作负载和128MB DRAM缓存的评估表明,128KB BWP的命中率达到76.2%。与LH缓存和Alloy Cache相比,BWP分别将性能提高了8.8%和12.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号