首页> 外文学位 >Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach.
【24h】

Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach.

机译:近似最近邻居搜索的快速局部敏感哈希算法:一种实用的数据挖掘方法。

获取原文
获取原文并翻译 | 示例

摘要

Locality Sensitive Hashing (LSH) is an index-based data structure that allows spatial item retrieval over a large dataset. The performance of LSH depends on three main parameters: a performance measure, denoted by ρ; the number of hash tables, denoted by L and the number of hash functions, denoted by k. These parameters influence the computational, memory space and query time complexities of the data structure. The minimization of ρ at a specific approximation factor c, is dependent on the load factor, α. Over the years, α=4 has been used by researchers as the optimal load factor that gives the minimum value for ρ. In addition, researchers use a one-degree-of-freedom design in which k is chosen and used to compute L. In this research, it is shown that the choice of α=4 does not give minimum value for ρ. To guarantee low computational, memory space and query time complexities, an optimal load factor of α=5 is proposed. In addition, a new LSH algorithm is proposed in which the L is chosen and used to compute k. To bridge the semantic gab between humans and computers, a Euclicosine similarity metric is proposed to rank retrieved items. Experiments on the Defense Meteorological Satellite Program imagery dataset have shown that the proposed techniques cut the computational and memory space complexities by 50% each and improve the query time complexity by a factor of 2.
机译:局部敏感哈希(LSH)是基于索引的数据结构,允许对大型数据集进行空间项检索。 LSH的性能取决于三个主要参数:性能指标,用ρ表示;哈希表的数量(由L表示)和哈希函数的数量(由k表示)。这些参数影响数据结构的计算,存储空间和查询时间复杂度。在特定的近似系数c下ρ的最小化取决于负载系数α。多年来,研究人员一直使用α= 4作为最佳负载因子,从而得出ρ的最小值。此外,研究人员使用一种自由度设计,其中选择了k并用于计算L。在这项研究中,研究表明,选择α= 4不会给出ρ的最小值。为了保证较低的计算,存储空间和查询时间复杂度,提出了最佳负载因子α= 5。此外,提出了一种新的LSH算法,其中L被选择并用于计算k。为了弥合人类和计算机之间的语义缺陷,提出了一种欧氏余弦相似度度量来对检索到的项目进行排名。对国防气象卫星计划影像数据集的实验表明,所提出的技术将计算和存储空间的复杂度分别降低了50%,并将查询时间的复杂度提高了2倍。

著录项

  • 作者

    Buaba, Ruben.;

  • 作者单位

    North Carolina Agricultural and Technical State University.;

  • 授予单位 North Carolina Agricultural and Technical State University.;
  • 学科 Engineering General.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号