首页> 外文学位 >Fast and Scalable Architectures and Algorithms for the Computation of the Forward and Inverse Discrete Periodic Radon Transform with Applications to 2D Convolutions and Cross-Correlations.

【24h】

Fast and Scalable Architectures and Algorithms for the Computation of the Forward and Inverse Discrete Periodic Radon Transform with Applications to 2D Convolutions and Cross-Correlations.

机译：快速和可扩展的体系结构和算法，用于计算正向和反向离散周期Radon变换，并应用于2D卷积和互相关。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Discrete Radon Transform (DRT) is an essential component of a wide range of applications in image processing, e.g. image denoising, image restoration, texture analysis, line detection, encryption, compressive sensing and reconstructing objects from projections in computed tomography and magnetic resonance imaging. A popular method to obtain the DRT, or its inverse, involves the use of the Fast Fourier Transform, with the inherent approximation/rounding errors and increased hardware complexity due the need for floating point arithmetic implementations. An alternative implementation of the DRT is through the use of the Discrete Periodic Radon Transform (DPRT). The DPRT also exhibits discrete properties of the continuous-space Radon Transform, including the Fourier Slice Theorem and the convolution property. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions O(N3) and the need for a large number of memory accesses.;This PhD dissertation introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: (i) a parallel array of fixed-point adder trees, (ii) circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees, and (iii) an image block-based approach to DPRT computation that can fit the proposed architecture to available resources, and as a result, for an NxN image (N prime), the proposed approach can compute up to N 2 additions per clock cycle. Compared to previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For the fastest case, I introduce optimized architectures that can compute the DPRT and its inverse in just 2N +(log2 N)+1 and 2N +3(log2 N)+B+2 clock cycles respectively, where B is the number of bits used to represent each input pixel. In comparison, the prior state of the art method required N2 +N +1 clock cycles for computing the forward DPRT. For systems with limited resources, the resource usage can be reduced to O(N) with a running time of (N/2)(N + 9) + N + 2 for the forward DPRT and (N/2)(N + 2) + 3(log2 N) + B + 4 for the inverse.;The results also have important applications in the computation of fast convolutions and cross-correlations for large and non-separable kernels. For this purpose, I introduce fast algorithms and scalable architectures to compute 2-D Linear convolutions/cross-correlations using the convolution property of the DPRT and fixed point arithmetic to simplify the 2-D problem into a 1-D problem. Also an alternative system is proposed for non-separable kernels with low rank using the LU decomposition. As a result, for implementations with enough resources, for a an image and convolution kernel of size PxP, linear convolutions/cross correlations can be computed in just 6N + 4 log2 N + 17 clock cycles for N = 2P-1.;Finally, I also propose parallel algorithms to compute the forward and inverse DPRT using Graphic Processing Units (GPUs) and CPUs with multiple cores. The proposed algorithms are implemented in a GPU Nvidia Maxwell GM204 with 2048 cores 1367MHz, 348KB L1 cache (24KB per multiprocessor), 2048KB L2 cache (512KB per memory controller), 4GB device memory, and compared against a serial implementation on a CPU Intel Xeon E5-2630 with 8 physical cores (16 logical processors via hyper-threading) 3.2GHz, L1 cache 512K (32KB Instruction cache, 32KB data cache, per core), L2 cache 2MB (256KB per core), L3 cache 20MB (Shared among all cores), 32GB of system memory. For the CPU, there is a tenfold speedup using 16 logical cores versus a single-core serial implementation. For the GPU, there is a 715-fold speedup compared to the serial implementation. For real-time applications, for an 1021x1021 image, the forward DPRT takes 11.5ms and 11.4ms for the inverse.

机译：离散Radon变换（DRT）是图像处理等多种应用中必不可少的组成部分。图像降噪，图像恢复，纹理分析，线检测，加密，压缩感测以及在计算机断层扫描和磁共振成像中根据投影重建对象。获得DRT或其逆的一种流行方法是使用快速傅立叶变换，由于需要浮点算术实现，它具有固有的近似/舍入误差和增加的硬件复杂性。 DRT的另一种实现方式是使用离散定期Radon变换（DPRT）。 DPRT还表现出连续空间Radon变换的离散属性，包括傅立叶切片定理和卷积属性。不幸的是，由于需要计算大量的加法运算O（N3）和需要进行大量的内存访问，因此DPRT的使用受到了限制。;本博士学位论文介绍了一种快速且可扩展的方法来计算前向和后向。反向DPRT，其基于以下用途：（i）定点加法器树的并行数组;（ii）循环移位寄存器，在为加法器树选择输入数据时无需访问外部存储器组件;以及（ iii）一种基于图像块的DPRT计算方法，可以使建议的体系结构适合可用资源，因此，对于NxN图像（N素数），建议的方法每个时钟周期最多可以计算N 2个加法。与以前的方法相比，可伸缩方法为不同数量的计算资源提供了最快的已知实现。对于最快的情况，我介绍了优化的体系结构，它们可以分别仅在2N +（log2 N）+1和2N +3（log2 N）+ B + 2个时钟周期内计算DPRT及其逆数，其中B是位数用于表示每个输入像素。相比之下，现有技术方法需要N2 + N +1个时钟周期来计算前向DPRT。对于资源有限的系统，对于前向DPRT，运行时间为（N / 2）（N + 9）+ N + 2，而运行时间为（N / 2）（N + 2），则可以将资源使用降低为O（N））+ 3（log2 N）+ B + 4;结果在大型和不可分离核的快速卷积和互相关的计算中也具有重要的应用。为此，我介绍了快速算法和可伸缩体系结构，以使用DPRT的卷积属性和定点算法将2-D线性卷积/互相关计算为二维问题，从而简化为一维问题。还提出了使用LU分解为具有低秩的不可分离内核的替代系统。结果，对于具有足够资源的实现，对于大小为PxP的图像和卷积核，对于N = 2P-1，仅可以在6N + 4 log2 N + 17个时钟周期内计算线性卷积/互相关。我还提出了并行算法，以使用具有多个核的图形处理单元（GPU）和CPU计算正向和反向DPRT。拟议的算法在具有2048个内核1367MHz，348KB L1高速缓存（每个多处理器24KB），2048KB L2高速缓存（每个内存控制器512KB），4GB设备内存的GPU Nvidia Maxwell GM204中实现，并且与在CPU Intel Xeon上的串行实现进行了比较E5-2630具有8个物理核心（通过超线程的16个逻辑处理器）3.2GHz，L1高速缓存512K（32KB指令高速缓存，32KB数据高速缓存，每个核心），L2高速缓存2MB（每个核心256KB），L3高速缓存20MB（共享所有核心），系统内存为32GB。对于CPU，使用16个逻辑内核的速度比单内核串行实现的速度提高了十倍。与串行实现相比，GPU的加速比提高了715倍。对于实时应用，对于1021x1021图像，正向DPRT的时间为11.5ms，逆向的时间为11.4ms。

著录项

作者
Carranza, Cesar.;
展开▼
作者单位

The University of New Mexico.;

展开▼
授予单位 The University of New Mexico.;
学科 Engineering.;Computer engineering.
学位 Ph.D.
年度 2016
页码 202 p.
总页数 202
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. VLSI Architecture for Fast Computation of 2D-Discrete Wavelet Transform and Low Power Feed Forward Neural Network Architecture for Image Compression [J] . Mr. Murali Mohan. S, Dr. P.Satyanarayana American Journal of Engineering Research . 2013,第10期

机译：用于二维离散小波变换快速计算的VLSI架构和用于图像压缩的低功耗前馈神经网络架构
2. A Computationally Efficient Pipelined Architecture for 1D/2D Lifting Based Forward and Inverse Discrete Wavelet Transform for CDF 5/3 Filter [J] . CEKLI S. Advances in Electrical and Computer Engineering . 2018,第2期

机译：基于CDF 5/3滤波器的基于1D / 2D提升的正向和反向离散小波变换的高效计算流水线架构
3. Fast computation of two-dimensional discrete cosine transforms using fast discrete Radon transform [J] . Ta N., Attikiouzel Y. Electronics Letters . 1991,第1期

机译：使用快速离散Radon变换快速计算二维离散余弦变换
4. A scalable architecture for implementing the fast discrete periodic radon transform for prime sized images [C] . Carranza Cesar, Llamocca Daniel, Pattichis Marios IEEE International Conference on Image Processing . 2014

机译：一种可扩展的体系结构，用于为素尺寸图像实现快速离散的周期性radon变换
5. Synthesis and development of image coding algorithms based on 2D discrete wavelet transform. [D] . Yok Wooi, Matthew Teow. 2011

机译：基于二维离散小波变换的图像编码算法的综合与发展。
6. Discrete Radon transform has an exact fast inverse and generalizes to operations other than sums along lines [O] . William H. Press 2006

机译：离散Radon变换具有精确快速的逆运算并且可以推广到除沿线求和之外的其他运算
7. Fast computation of two-dimensional discrete cosine transforms using fast discrete radon transform [O] . Ta, N., Attikiouzel, Y., Crebbin, G. 1991

机译：使用快速离散radon变换快速计算二维离散余弦变换
8. Comparison of Arithmetic Requirements for the PFA (Prime Factor Algorithm), WFTA (Winograd Fourier Transform Algorithm), SWIFT, MFFT (Mixed Radix Fast Fourier Transform), FFT (Fast Fourier Transform) and DFT (Discrete Fourier Transform) Algorithms. [R] . Hicks, R. C. 1982

机译：pFa（素因子算法），WFTa（Winograd傅立叶变换算法），sWIFT，mFFT（混合基线快速傅里叶变换），FFT（快速傅立叶变换）和DFT（离散傅立叶变换）算法的算术要求的比较。

Fast and Scalable Architectures and Algorithms for the Computation of the Forward and Inverse Discrete Periodic Radon Transform with Applications to 2D Convolutions and Cross-Correlations.

摘要

著录项

相似文献

相关主题

期刊订阅