首页> 外文期刊>Information Processing & Management >An algorithm to cluster documents based on relevance
【24h】

An algorithm to cluster documents based on relevance

机译:一种基于相关性的文档聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.
机译:当向用户展示搜索结果时,搜索引擎无法清楚地区分不同的相关项目。相反,他们依靠系统的用户来估计哪些项目相关,部分相关或不相关。系统用户承担了区分与不同程度相关的文档的任务。此过程通常会妨碍相关或部分相关文档的可访问性,尤其是在结果集很大且相关性不同的文档分散在整个文档集中时。在本文中,我们提出了一种聚类方案,该方案将给定搜索的相关,部分相关和不相关区域内的文档分组。聚类算法根据相关性完成文档的聚类任务。最终用户对搜索返回的文档进行分类,区间和描述性的相关判断,从而对这些分类进行了评估。测量了每个聚类区域的用户和系统之间的重叠程度,以确定该算法的总体有效性。这项研究表明,按相关区域对Web上的文档进行聚类非常必要并且十分可行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号