首页> 外文学位 >Investigation of procedures for information retrieval based on pigeonhole principle.
【24h】

Investigation of procedures for information retrieval based on pigeonhole principle.

机译:基于信鸽原理的信息检索程序研究。

获取原文
获取原文并翻译 | 示例

摘要

Big Data is the new term of the exponential growth of data in the Internet. The importance of Big Data is not about how large it is, but about what information you can get from analyzing these data. Such analysis would help many businesses on making smarter decisions, and provide time and cost reduction. Therefore, to make such analysis, you will definitely need to search the large files on Big Data. Big Data is such a construction where sequential search is prohibitively inefficient, in terms of time and energy. Therefore, any new technique that allows very efficient search in very large files is highly demanded. This research presents an innovative approach for efficient searching with fuzzy criteria in very large information systems (Big Data). Organization of efficient access to a large amount of information by an "approximate" or "fuzzy" indication is a rather complicated Computer Science problem. Usually, the solution of this problem relies on a brute force approach, which results in sequential look-up of the file. In many cases, this substantially undermines system performance. The suggested technique uses different approach based on the Pigeonhole Principle. It searches binary strings that match the given request approximately. Considering the following problem, a data to be searched is presented as a bit-attribute vector. The searching operation consists of finding a subset of this bit-attribute vector that is within particular Hamming distance.;The analysis of this new method shows significant gain in performance in the organization of this searching. It substantially reduces the sequential search operations and works extremely efficiently from several orders of magnitude including speed, cost and energy.
机译:大数据是Internet数据呈指数增长的新术语。大数据的重要性不在于它的大小,而在于您可以通过分析这些数据获得哪些信息。这种分析将帮助许多企业做出更明智的决策,并减少时间和成本。因此,要进行此类分析,您肯定需要搜索大数据上的大文件。大数据就是这样一种结构,从时间和精力上来说,顺序搜索的效率非常低。因此,迫切需要能够在非常大的文件中非常有效地进行搜索的任何新技术。这项研究提出了一种创新的方法,用于在大型信息系统(大数据)中使用模糊标准进行有效搜索。通过“近似”或“模糊”指示来组织对大量信息的有效访问是一个相当复杂的计算机科学问题。通常,此问题的解决方案依赖于蛮力方法,这会导致顺序查找文件。在许多情况下,这大大损害了系统性能。建议的技术基于鸽子洞原理使用不同的方法。它搜索与给定请求近似匹配的二进制字符串。考虑以下问题,将要搜索的数据表示为比特属性矢量。搜索操作包括在特定汉明距离内找到该位属性向量的子集。;对该新方法的分析显示,在此搜索组织中,性能显着提高。它极大地减少了顺序搜索操作,并且从速度,成本和精力等几个数量级起极其高效地工作。

著录项

  • 作者

    Yammahi, Maryam.;

  • 作者单位

    The George Washington University.;

  • 授予单位 The George Washington University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 86 p.
  • 总页数 86
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号