首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Implementation of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs
【24h】

Implementation of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

机译:基于GPU的LDPC块码和LDPC卷积码解码器的实现

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, efficient LDPC block-code decoders/simulators which run on graphics processing units (GPUs) are proposed. We also implement the decoder for the LDPC convolutional code (LDPCCC). The LDPCCC is derived from a predesigned quasi-cyclic LDPC block code with good error performance. Compared to the decoder based on the randomly constructed LDPCCC code, the complexity of the proposed LDPCCC decoder is reduced due to the periodicity of the derived LDPCCC and the properties of the quasicyclic structure. In our proposed decoder architecture, Γ (Γ is a multiple of a warp) codewords are decoded together, and hence, the messages of Γ codewords are also processed together. Since all the Γ codewords share the same Tanner graph, messages of the Γ distinct codewords corresponding to the same edge can be grouped into one package and stored linearly. By optimizing the data structures of the messages used in the decoding process, both the read and write processes can be performed in a highly parallel manner by the GPUs. In addition, a thread hierarchy minimizing the divergence of the threads is deployed, and it can maximize the efficiency of the parallel execution. With the use of a large number of cores in the GPU to perform the simple computations simultaneously, our GPU-based LDPC decoder can obtain hundreds of times speedup compared with a serial CPU-based simulator and over 40 times speedup compared with an eight-thread CPU-based simulator.
机译:在本文中,提出了在图形处理单元(GPU)上运行的高效LDPC块码解码器/模拟器。我们还为LDPC卷积码(LDPCCC)实现了解码器。 LDPCCC源自具有良好错误性能的预先设计的准循环LDPC块代码。与基于随机构造的LDPCCC码的解码器相比,由于导出的LDPCCC的周期性和准循环结构的特性,所提出的LDPCCC解码器的复杂性降低了。在我们提出的解码器体系结构中,Γ(Γ是扭曲的倍数)码字被一起解码,因此,Γ码字的消息也被一起处理。由于所有Γ代码字共享相同的Tanner图,因此对应于同一边的Γ不同代码字的消息可以分组为一个包并线性存储。通过优化解码过程中使用的消息的数据结构,GPU可以以高度并行的方式执行读取和写入过程。另外,部署了使线程分歧最小化的线程层次结构,并且可以最大化并行执行的效率。通过在GPU中使用大量内核来同时执行简单的计算,与基于串行CPU的模拟器相比,我们基于GPU的LDPC解码器可以获得数百倍的加速,而与八线程相比则可以达到40倍以上的加速基于CPU的模拟器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号