【24h】

Selecting Sketches for Similarity Search

机译:选择草图进行相似度搜索

获取原文

摘要

Techniques of the Hamming embedding, producing bit string sketches, have been recently successfully applied to speed up similarity search. Sketches are usually compared by the Hamming distance, and applied to filter out non-relevant objects during the query evaluation. As several sketching techniques exist and each can produce sketches with different lengths, it is hard to select a proper configuration for a particular dataset. We assume that the (dis)similarity of objects is expressed by an arbitrary metric function, and we propose a way to efficiently estimate the quality of sketches using just a small sample set of data. Our approach is based on a probabilistic analysis of sketches which describes how separated are objects after projection to the Hamming space.
机译:汉明嵌入技术(产生位字符串草图)最近已成功应用于加速相似性搜索。通常将草图与汉明距离进行比较,并在查询评估期间将其应用于过滤不相关的对象。由于存在几种草图绘制技术,并且每种都可以生成具有不同长度的草图,因此很难为特定数据集选择合适的配置。我们假定对象的(非)相似性由任意度量函数表示,并且我们提出了一种仅使用少量数据样本就可以有效地估计草图质量的方法。我们的方法基于对草图的概率分析,该草图描述了投影到汉明空间后对象之间的分离程度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号