Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

E Wes Bethel; Mark Howison

首页> 外文期刊>International Journal of High Performance Computing Applications >Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

【24h】

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

机译：多核和多核共享内存并行光线投射体积渲染优化和调整

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

机译：鉴于计算行业通过在芯片上增加更多内核来增加处理能力的趋势，这项工作的重点是调整钉书可视化算法的性能，光线投射体积渲染，在多核CPU和多核上实现共享内存并行性。 GPU。我们的方法是更改可调算法设置，以及已知的算法优化和两种不同的内存布局，并根据绝对运行时间和L2内存高速缓存未命中来衡量性能。我们的结果表明，所有平台上的运行时性能差异很大，我们在多核CPU上测试的可调参数高达254％，在多核GPU上高达265％，并且最佳配置在各个平台上各不相同，通常一种非显而易见的方式。例如，我们的结果表明，GPU上的最佳配置出现在保持高速缓存利用率良好的配置和饱和计算吞吐量的交叉点之间。对于此特定算法，使用经验性能模型可能很难预测此结果，因为它具有非结构化的内存访问模式，该模式对于单个光线局部更改，对于所选视点全局更改。我们的结果还表明，现代体系结构上的最佳参数与以前在较旧的体系结构上进行的研究中的参数明显不同。此外，考虑到跨平台的最佳算法设置和性能结果的巨大性能差异，采用可视化和分析代码通过自动调整性能优化策略具有明显的好处。随着每个芯片的内核数量以及通过存储器层次结构传输数据的成本都增加，这些好处在将来可能会变得更加明显。

著录项

来源
《International Journal of High Performance Computing Applications》 |2012年第4期|399-412|共14页
作者
E Wes Bethel; Mark Howison;
展开▼
作者单位

Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F, Berkeley, CA 94720, USA;

Center for Computation and Visualization, Brown University, Providence, Rl, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel volume rendering; performance optimization; auto-tuning; multi-core CPU; many-core GPU;

机译：并行体积渲染;性能优化;自动调节;多核CPU;多核GPU;

相似文献

外文文献
中文文献
专利

1. A multi-core CPU and many-core GPU based fast parallel shuffled complex evolution global optimization approach [J] . Guangyuan Kan, Tianjie Lei, Ke Liang, IEEE Transactions on Parallel and Distributed Systems . 2017,第2期

机译：基于多核CPU和多核GPU的快速并行混洗复杂演化全局优化方法
2. Parallel Implementations of the Cooperative Particle Swarm Optimization on Many-core and Multi-core Architectures [J] . Nadia Nedjah, Rogerio de M. Calazan, Luiza de Macedo Mourelle, International journal of parallel programming . 2016,第6期

机译：多核和多核体系结构上协同粒子群优化的并行实现
3. Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems [J] . Howison Mark, Bethel E. Wes, Childs Hank Visualization and Computer Graphics, IEEE Transactions on . 2012,第1期

机译：在大型，多核和多核系统上进行体积渲染的混合并行性
4. Hybrid Parallelism for Volume Rendering on Large, Multi-core Systems [C] . Mark Howison, E. Wes Bethel, Hank Childs International Conference on Numerical Modeling of Space Plasma Flows . 2011

机译：大型多核系统上的体积渲染的混合并行性
5. Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems. [D] . Baskaran, Muthu Manikandan. 2009

机译：编译时和运行时优化，用于增强多核和多核系统上的局部性和并行性。
6. A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems [O] . James Rudd, Jason H. Moore, Ryan J. Urbanowicz -1

机译：学习分类器系统中统计意义测试的多核并行化策略
7. Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning [O] . E Wes Bethel, Mark Howison 2012

机译：多核和许多核心共享 - 内存并行射线卷尺呈现优化和调整

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

摘要

著录项

相似文献

相关主题

期刊订阅