首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Efficient Methods for Mapping Neural Machine Translator on FPGAs
【24h】

Efficient Methods for Mapping Neural Machine Translator on FPGAs

机译:在FPGA上映射神经机转换器的高效方法

获取原文
获取原文并翻译 | 示例
           

摘要

Neural machine translation (NMT) is one of the most critical applications in natural language processing (NLP) with the main idea of converting text in one language to another using deep neural networks. In recent year, we have seen continuous development of NMT by integrating more emerging technologies, such as bidirectional gated recurrent units (GRU), attention mechanisms, and beam-search algorithms, for improved translation quality. However, with the increasing problem size, the real-life NMT models have become much more complicated and difficult to implement on hardware for acceleration opportunities. In this article, we aim to exploit the capability of FPGAs to deliver highly efficient implementations for real-life NMT applications. We map the inference of a large-scale NMT model with total computation of 172 GFLOP to a highly optimized high-level synthesis (HLS) IP and integrate the IP into Xilinx VCU118 FPGA platform. The model has widely used key features for NMTs, including the bidirectional GRU layer, attention mechanism, and beam search. We quantize the model to mixed-precision representation in which parameters and portions of calculations are in 16-bit half precision, and others remain as 32-bit floating-point. Compared to the float NMT implementation on FPGA, we achieve 13.1x speedup with an end-to-end performance of 22.0 GFLOPS without any accuracy degradation. Based on our knowledge, this is the first work that successfully implements a real-life end-to-end NMT model to an FPGA on board.
机译:神经电脑翻译(NMT)是自然语言处理(NLP)中最关键的应用之一,主要思想以一种语言将文本转换为另一语言的主要思想。近年来,我们通过集成更多的新兴技术,例如双向门控经常性单元(GRU),注意机制和光束搜索算法,以实现纽约州的持续发展,以提高翻译质量。然而,随着问题规模的不断增加,现实生活NMT模型已经变得更加复杂,在硬件上实现了加速机会的硬件。在本文中,我们的目标是利用FPGA的能力为现实生活NMT应用提供高效实现。我们将大规模NMT模型推断使用172 GFLOP的总计算到高度优化的高级合成(HLS)IP,并将IP集成到Xilinx VCU118 FPGA平台中。该模型广泛使用了NMT的关键特征,包括双向GRU层,注意机制和光束搜索。我们将模型量化为混合精度表示,其中计算的参数和部分是16比特半精度,其他人仍然是32位浮点。与FPGA上的浮动NMT实现相比,我们实现了13.1倍的加速,结束于结束性能为22.0 gflops,没有任何精度下降。根据我们的知识,这是第一项工作,成功将现场终端到底NMT模型成功实现到船上的FPGA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号