kNN算法是机器学习和数据挖掘程序中经常使用的经典算法.随着数据量的增大,kNN算法的执行时间急剧上升.为了有效利用现代计算机的GPU等计算单元减少kNN算法的计算时间,提出了一种基于OpenCL的并行kNN算法,该算法对距离计算和排序两个瓶颈点进行并行化,在距离计算阶段使用细粒度并行化策略和优化的线程模型,排序阶段使用优化内存模型的双调排序.以UCI数据集letter为测试集,分别使用E8400和GTS450运行kNN算法进行测试,采用GPU加速的并行kNN算法的计算速度比CPU版提高了40.79倍.%The kNN algorithm is a classical algorithm often used in machine learning and data mining programs.With the increasing amount of data,the execution time of thekNN algorithm increases sharply.In order to effectively utilize GPU and other computing units of modern computers to reduce the computation time of the kNN algorithm,we present a parallel kNN algorithm based on OpenCL,which parallelizes the two segments of bottleneck code:distance calculation and sorting.The algorithm adopts the fine-grained parallelization strategy and the optimized memory model in the phase of distance calculation and uses bitonic sort that can optimize memory model in the phase of sorting.We use Letter,one of UCI datasets,as the test set and E8400 AND GTS450 to run the kNN algorithm for testing.The computing speed of the parallel kNN algorithm accelerated by GPU is 40.79 times faster than that of its CPU version.
展开▼