首页> 外文会议>IEEE International Congress on Big Data >Towards Efficient KNN Joins on Data Streams
【24h】

Towards Efficient KNN Joins on Data Streams

机译:迈向高效的KNN加入数据流

获取原文

摘要

We study the problem of efficient processing of kNN joins over high-dimensional data streams, which is an operation required by many big data applications. Specifically, we are concerned with the continuous evaluation of a set of k nearest neighbor queries Q on streams of high-dimensional items at consecutive snapshots of those streams. While one possible solution is to evaluate the kNN joins starting from scratch at each snapshot, it is too expensive for large volumes of data we encounter in big data applications. We consider the data stream on a time window and maintain the join results for Q at every snapshot in main memory. Our approach to this problem is to build indexes on Q, and only update the results of the queries affected by the changes in the streams at each snapshot. We propose a main-memory structure called the High-dimensional R-tree (HDR-tree) to index the queries, which is efficient in finding affected queries with reasonable maintenance cost. HDR-tree takes advantage of the benefit of clustering and the principle component analysis (PCA) technique. Preliminary experimental results show that our index structures significantly outperform baseline methods.
机译:我们研究了在高维数据流上高效处理kNN联接的问题,这是许多大数据应用程序所要求的操作。具体来说,我们关注在高维项流的连续快照上对k个最近邻查询Q进行连续评估的情况。尽管一种可能的解决方案是从每个快照的头开始评估kNN连接,但对于我们在大数据应用程序中遇到的大量数据而言,这太昂贵了。我们在一个时间窗口上考虑数据流,并在主内存中的每个快照上维护Q的联接结果。我们解决此问题的方法是在Q上建立索引,并且仅更新受每个快照中流变化影响的查询结果。我们提出了一种称为“高维R-树(HDR-tree)”的主内存结构来对查询建立索引,从而可以以合理的维护成本有效地查找受影响的查询。 HDR树利用聚类和主成分分析(PCA)技术的优势。初步实验结果表明,我们的索引结构明显优于基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号