...
首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Achieving High-Performance On-Chip Networks With Shared-Buffer Routers
【24h】

Achieving High-Performance On-Chip Networks With Shared-Buffer Routers

机译:使用共享缓冲区路由器实现高性能片上网络

获取原文
获取原文并翻译 | 示例
           

摘要

On-chip routers typically have buffers dedicated to their input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power budgets. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred simultaneously. Therefore, a large number of buffer queues in the network are empty and other queues are mostly busy. This observation motivates us to design router architecture with shared queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher saturation throughput than a typical virtualchannel (VC) router. Because of its higher performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the same buffer space capacity. Over real multitask applications and E3S embedded benchmarks using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router and targeting the same application throughput with 30% lower energy per packet.
机译:片上路由器通常具有专用于其输入或输出端口的缓冲区,用于在输出物理通道上发生争用时临时存储数据包。不幸的是,缓冲器消耗了路由器面积和功率预算的很大一部分。但是,在运行流量跟踪时,并非路由器的所有输入端口都有需要同时传输的传入数据包。因此,网络中的大量缓冲区队列为空,而其他队列则大多为繁忙。这种观察促使我们设计具有共享队列(RoShaQ)的路由器体系结构,该路由器体系结构通过允许在输入端口之间共享多个缓冲区队列来最大化缓冲区利用率。实际上,共享队列使使用缓冲区的效率更高,因此在网络负载沉重时能够实现更高的吞吐量。另一方面,在低流量负载下,我们的路由器通过允许数据包有效绕过这些共享队列来实现低延迟。在65纳米CMOS标准单元工艺上的实验结果表明,与典型的虚拟通道(VC)路由器相比,RoShaQ在合成流量上的等待时间缩短了17%,饱和吞吐量提高了18%。由于具有更高的性能,在相同的缓冲区空间容量的情况下,与VC路由器相比,RoShaQ每个传输的数据包消耗的能量要少9%。在实际的多任务应用程序和使用接近最佳的NMAP映射算法的E3S嵌入式基准测试中,RoShaQ的延迟比VC路由器低32%,并且针对相同的应用程序吞吐量,每个数据包的能耗降低30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号