首页> 外文期刊>International journal of information retrieval research >A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval
【24h】

A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval

机译:基于最高感计数的印地语Web信息检索中的Web查询消歧方法

获取原文
获取原文并翻译 | 示例
           

摘要

The ambiguity in word senses has been recognized as a major challenge for the information retrieval systems. Hindi language web information retrieval, like other languages, faces the problem of sense ambiguity. The sense ambiguity problem deteriorates the performance of every natural language processing (NLP) application. The performance of Hindi language web information retrieval is also affected by it. In this paper, the author formalized an approach for the disambiguation of the senses to improve the performance of Hindi web information retrieval. Our system works in such a way that ambiguity detection has been performed before disambiguation of web queries. Test samples of 100 queries have been selected. When these queries were subjected to ambiguity detection, we found that 43% of them have been detected unambiguous. After ambiguity detection, the disambiguation approach is followed which is based on HSC (Highest Sense Count). Query disambiguation approach further follows query expansion. The expanded query generates the new result set which results into high precision and high similarity score. The 57 expanded queries are tested against 1000 test document instances. The overall improvement is 45% in the average precision, 23% in interpolated average precision and a significant improvement in the average similarity score of the new generated result set. The overall accuracy of our approach has been 61.4% and it improves the performance of the system by 45%.
机译:词义上的歧义已被认为是信息检索系统的主要挑战。像其他语言一样,印地语网络信息检索也面临着含糊不清的问题。含糊不清的问题恶化了每个自然语言处理(NLP)应用程序的性能。印地语网络信息检索的性能也受其影响。在本文中,作者正式提出了一种消除歧义的方法,以提高印地语网络信息检索的性能。我们的系统以这样的方式工作:在对Web查询进行歧义消除之前已经执行了歧义检测。已选择100个查询的测试样本。对这些查询进行歧义检测后,我们发现其中有43%的歧义被检测到。在进行歧义检测之后,遵循基于HSC(最高感知计数)的歧义消除方法。查询消歧方法进一步遵循查询扩展。扩展的查询生成新的结果集,该结果集导致高精度和高相似性得分。针对1000个测试文档实例测试了57个扩展查询。总体改进后的平均精度为45%,插值平均精度为23%,新生成的结果集的平均相似性得分显着提高。我们的方法的总体准确性为61.4%,它将系统的性能提高了45%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号