An algorithm to cluster documents based on relevance

Monica Desai; Amanda Spink

首页> 外文期刊>Information Processing & Management >An algorithm to cluster documents based on relevance

【24h】

An algorithm to cluster documents based on relevance

机译：一种基于相关性的文档聚类算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.

机译：当向用户展示搜索结果时，搜索引擎无法清楚地区分不同的相关项目。相反，他们依靠系统的用户来估计哪些项目相关，部分相关或不相关。系统用户承担了区分与不同程度相关的文档的任务。此过程通常会妨碍相关或部分相关文档的可访问性，尤其是在结果集很大且相关性不同的文档分散在整个文档集中时。在本文中，我们提出了一种聚类方案，该方案将给定搜索的相关，部分相关和不相关区域内的文档分组。聚类算法根据相关性完成文档的聚类任务。最终用户对搜索返回的文档进行分类，区间和描述性的相关判断，从而对这些分类进行了评估。测量了每个聚类区域的用户和系统之间的重叠程度，以确定该算法的总体有效性。这项研究表明，按相关区域对Web上的文档进行聚类非常必要并且十分可行。

著录项

来源
《Information Processing & Management》 |2005年第5期|p.1035-1049|共15页
作者
Monica Desai; Amanda Spink;
展开▼
作者单位

Department of Computing Science and Engineering, The Pennsylvania State University, 220 Pond Laboratories, University Park, PA 16802, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;情报学、情报工作;
关键词

相似文献

外文文献
中文文献
专利

1. An Approach to Improve Quality of Document Clustering by Word Set Based Documenting Clustering Algorithm [J] . Sandeep Sharma, Ruchi Dave, Naveen Hemrajani Oriental journal of computer science and technology . 2011,第2期

机译：基于词集的文档聚类算法提高文档聚类质量的方法
2. MLK-Means - A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering [J] . P. PERUMAL, R. NEDUNCHEZHIAN WSEAS Transactions on Information Science and Applications . 2012,第7a9期

机译：MLK-Means-用于文档聚类的基于混合机器学习的K-Means聚类算法
3. COMMON SENSE BASED TEXT DOCUMENT CLUSTERING ALGORITHM BY COARSE AND FINE GRAINED CLUSTERING TECHNIQUES [J] . G. LOSHMA, DR. NAGARATNA P HEDGE Journal of Theoretical and Applied Information Technology . 2017,第10期

机译：粗糙和精细粒度聚类技术的基于感知的文本文档聚类算法
4. Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection [C] . Shariq Bashir, Andreas Rauber 18th ACM conference on information and knowledge management 2009 . 2009

机译：通过基于聚类的伪相关反馈文档选择来提高专利的可检索性
5. Comparison of clustering algorithms and its application to document clustering. [D] . Chen, Jie. 2005

机译：聚类算法的比较及其在文档聚类中的应用。
6. Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks [O] . Suganya Selvaraj, Eunmi Choi 2021

机译：文本文档集群中的群智能算法与各种基准
7. An algorithm to cluster documents based on relevance [O] . Desai Monica, Spink Amanda H. 2005

机译：一种基于相关度的文档聚类算法

An algorithm to cluster documents based on relevance

摘要

著录项

相似文献

相关主题

期刊订阅