首页> 美国卫生研究院文献>other >Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
【2h】

Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

机译:GPU上的高效并行视频处理技术:从框架到实现

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.
机译:通过重组执行顺序并优化数据结构,我们提出了一种基于大规模并行架构的H.264 / AVC编码器高效并行框架。我们由CUDA在NVIDIA的GPU上实施了建议的框架。不仅使H.264编码器的计算密集型组件并行化,而且还可以有效地实现控制密集型组件,例如CAVLC和解块滤波器。此外,我们提出了串行优化方法,包括用于运动估计的多分辨率多窗口,尽可能提高帧内编码并行性的多级并行策略,基于组件的并行CAVLC以及方向优先级解块滤波器。 H.264编码器超过96%的工作负载已转移到GPU。实验结果表明,该并行实现的性能比串行程序高20倍,满足了30 fps实时高清编码的要求。当保持相同的比特率时,PSNR的损失为0.14 dB至0.77 dB。通过对内核的分析,我们发现计算密集型算法的加速比与GPU的计算能力成正比。但是,控制密集型部件(CAVLC)的性能与内存带宽密切相关,这为新的体系结构设计提供了见识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号