Partition Selection for Large-Scale Data Management Using KNN Join Processing

Hu Yue; Peng Ge; Wang ZehuaCui YanrongQin Hang

首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Partition Selection for Large-Scale Data Management Using KNN Join Processing

【24h】

Partition Selection for Large-Scale Data Management Using KNN Join Processing

机译：Partition Selection for Large-Scale Data Management Using KNN Join Processing

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

For the data processing with increasing avalanche under large datasets, theknearest neighbors (KNN) algorithm is a particularly expensive operation for both classification and regression predictive problems. To predict the values of new data points, it can calculate the feature similarity between each object in the test dataset and each object in the training dataset. However, due to expensive computational cost, the single computer is out of work to deal with large-scale dataset. In this paper, we propose an adaptive vKNN algorithm, which adopts on the Voronoi diagram under the MapReduce parallel framework and makes full use of the advantages of parallel computing in processing large-scale data. In the process of partition selection, we design a new predictive strategy for sample point to find the optimal relevant partition. Then, we can effectively collect irrelevant data, reduce KNN join computation, and improve the operation efficiency. Finally, we use a large number of 54-dimensional datasets to conduct a large number of experiments on the cluster. The experimental results show that our proposed method is effective and scalable with ensuring accuracy.

著录项

来源
《Mathematical Problems in Engineering: Theory, Methods and Applications》 |2020年第33期|7898230.1-7898230.14|共14页
作者
Hu Yue; Peng Ge; Wang ZehuaCui YanrongQin Hang;
展开▼
作者单位

Yangtze Univ, Comp Sch, Jingzhou 434023, Hubei, Peoples R China|Jingpeng Software Grp Co Ltd, Hubei Grad Workstn, Jingzhou, Hubei, Peoples R China;

Yangtze Univ, Comp Sch, Jingzhou 434023, Hubei, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词

Partition Selection for Large-Scale Data Management Using KNN Join Processing

摘要

著录项

相关主题

期刊订阅