Towards Efficient KNN Joins on Data Streams

机译：迈向高效的KNN加入数据流

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the problem of efficient processing of kNN joins over high-dimensional data streams, which is an operation required by many big data applications. Specifically, we are concerned with the continuous evaluation of a set of k nearest neighbor queries Q on streams of high-dimensional items at consecutive snapshots of those streams. While one possible solution is to evaluate the kNN joins starting from scratch at each snapshot, it is too expensive for large volumes of data we encounter in big data applications. We consider the data stream on a time window and maintain the join results for Q at every snapshot in main memory. Our approach to this problem is to build indexes on Q, and only update the results of the queries affected by the changes in the streams at each snapshot. We propose a main-memory structure called the High-dimensional R-tree (HDR-tree) to index the queries, which is efficient in finding affected queries with reasonable maintenance cost. HDR-tree takes advantage of the benefit of clustering and the principle component analysis (PCA) technique. Preliminary experimental results show that our index structures significantly outperform baseline methods.

机译：我们研究了在高维数据流上高效处理kNN联接的问题，这是许多大数据应用程序所要求的操作。具体来说，我们关注在高维项流的连续快照上对k个最近邻查询Q进行连续评估的情况。尽管一种可能的解决方案是从每个快照的头开始评估kNN连接，但对于我们在大数据应用程序中遇到的大量数据而言，这太昂贵了。我们在一个时间窗口上考虑数据流，并在主内存中的每个快照上维护Q的联接结果。我们解决此问题的方法是在Q上建立索引，并且仅更新受每个快照中流变化影响的查询结果。我们提出了一种称为“高维R-树（HDR-tree）”的主内存结构来对查询建立索引，从而可以以合理的维护成本有效地查找受影响的查询。 HDR树利用聚类和主成分分析（PCA）技术的优势。初步实验结果表明，我们的索引结构明显优于基线方法。

著录项

来源
《IEEE International Congress on Big Data》|2014年|782-783|共2页
会议地点
作者
Yang Chong; Yu Xiaohui; Liu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithm design and analysis; Big data; Clustering algorithms; Educational institutions; Indexes; Maintenance engineering; Principal component analysis; data stream; high dimensional data; k nearest neighbor join;

机译：算法设计与分析;大数据;聚类算法;教育机构;索引;维修工程;主成分分析;数据流;高维数据; k最近邻居加入;

相似文献

外文文献
中文文献
专利

1. Partition Selection for Large-Scale Data Management Using KNN Join Processing [J] . Yue Hu, Ge Peng, Zehua Wang, Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：使用KNN加入处理进行大规模数据管理的分区选择
2. FML-kNN: scalable machine learning on Big Data using k -nearest neighbor joins [J] . Georgios Chatzigeorgakidis, Sophia Karagiorgou, Spiros Athanasiou, Journal of Big Data . 2018,第1期

机译：FML-kNN：使用k-最近邻居加入对大数据进行可扩展的机器学习
3. An Efficient KNN Classification by using Combination of Additive and Multiplicative Data Perturbation for Privacy Preserving Data Mining [J] . Bhupendra Kumar Pandya, Umesh kumar Singh, Keerti Dixit International Journal of Engineering Trends and Technology . 2015,第7期

机译：通过使用加性和乘性数据摄动相结合的有效KNN分类进行隐私保护数据挖掘
4. Towards Efficient KNN Joins on Data Streams [C] . Yang Chong, Yu Xiaohui, Liu Yang IEEE International Congress on Big Data . 2014

机译：朝向高效的KNN加入数据流
5. Efficient Algorithms for Frequent Path Finding and Similarity Join in Big Multidimensional Data [D] . Luo, Wuman 2012

机译：大多维数据中频繁路径查找和相似联接的高效算法
6. Streaming MASSIF: Cascading Reasoning for Efficient Processing of IoT Data Streams [O] . Pieter Bonte, Riccardo Tommasini, Emanuele Della Valle, 2018

机译：流式MASSIF：物联网数据流高效处理的级联推理
7. Efficient Parallel kNN Joins for Large Data in MapReduce [O] . Chi Zhang, Feifei Li, Jeffrey Jestes 2012

机译：mapReduce中大数据的高效并行kNN连接

Towards Efficient KNN Joins on Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅