首页> 外文会议>20th IEEE Symposium on Computer Arithmetic >Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots
【24h】

Radix-8 Digit-by-Rounding: Achieving High-Performance Reciprocals, Square Roots, and Reciprocal Square Roots

机译:Radix-8逐圆数字:实现高性能的倒数,平方根和倒数的平方根

获取原文

摘要

We describe a high-performance digit-recurrence algorithm for computing exactly rounded reciprocals, square roots, and reciprocal square roots in hardware at a rate of three result bits -- one radix-8 digit -- per recurrence iteration. To achieve a single-cycle recurrence at a short cycle time, we adapted the digit-by-rounding algorithm, which is normally applied at much higher radices, for efficient operation at radix 8. Using this approach avoids in the recurrence step the lookup table required by SRT -- the usual algorithm used for hardware digit recurrences. The increasing access latency of this table, the size of which grows super linearly in the radix, limits high-frequency SRT implementations to radix 4 or lower. We also developed a series of novel optimizations focused on further reducing the critical path through the recurrence. We propose, for example, decreasing data path widths to a point where erroneous results sometimes occur and then correcting these errors off the critical path. We present a specific implementation that computes any of these functions to 31 bits of precision in 13 cycles. Our implementation achieves a cycle time only 11% longer than the best reported SRT design for the same functions, yet delivers results in five fewer cycles. Finally, we show that even at lower radices, a digit-by-rounding design is likely to have a shorter critical path than one using SRT at the same radix.
机译:我们描述了一种高性能的数字递归算法,用于在硬件中以三个结果位的速率(每个基数为一个基数8位数)计算精确舍入的倒数,平方根和倒数平方根。为了在短周期内实现单周期重现,我们调整了逐位四舍五入算法,该算法通常在半径较大的半径上应用,以实现基数为8的有效操作。使用此方法可避免在重现步骤中查找表SRT所需的-用于硬件数字重复的常用算法。该表的访问等待时间不断增加,该表的大小在基数中呈超线性增长,从而将高频SRT实现限制为基数4或更低。我们还开发了一系列新颖的优化程序,旨在进一步减少重复发生的关键路径。例如,我们建议将数据路径宽度减小到有时会出现错误结果的程度,然后从关键路径中纠正这些错误。我们提出了一种特定的实现,该实现在13个周期内将这些函数中的任何一个计算为31位精度。对于相同的功能,我们的实现所实现的周期时间仅比最佳报告的SRT设计长11%,而结果却减少了五个周期。最后,我们表明,即使在较低的半径下,与在相同基数下使用SRT的情况相比,逐位舍入设计也可能具有更短的关键路径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号