Document Clustering with Committees

机译：与委员会的文档群集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document clustering is useful in many information retrieval tasks: document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents, etc. The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. We present a clustering algorithm called CBC (Clustering By Committee) that is shown to produce higher quality clusters in document clustering tasks as compared to several well known clustering algorithms. It initially discovers a set of tight clusters (high intra-group similarity), called committees, that are well scattered in the similarity space (low inter-group similarity). The union of the committees is but a subset of all elements. The algorithm proceeds by assigning elements to their most similar committee. Evaluating cluster quality has always been a difficult task We present a new evaluation methodology that is based on the editing distance between output clusters and manually constructed classes (the answer key). This evaluation measure is more intuitive and easier to interpret than previous evaluation measures.

机译：文档群集在许多信息检索任务中是有用的：文档浏览，组织和查看检索结果，yahoo样品的生成等文件等。聚类的一般目标是分组数据元素，使得组内的相似性高并且间间相似之处很低。我们介绍了一种名为CBC（委员会聚类）的聚类算法，该算法显示与多个众所周知的聚类算法相比，在文档聚类任务中产生更高质量的群集。它最初发现一组紧密的群集（群体内部相似性），称为委员会，它们在相似度空间（低间间相似性）中均匀地分散。委员会的联盟是所有元素的子集。算法通过将元素分配给其最相似的委员会来进行。评估群集质量一直是一项艰巨的任务，我们提出了一种基于输出群集和手动构造的类（答案密钥）之间的编辑距离的新评估方法。该评估措施比以前的评估措施更直观，更容易解释。

著录项

来源
《Annual international ACM SIGIR conference on research and development in information retrieval》|2002年||共8页
会议地点
作者
Patrick Pantel; Dekang Lin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类情报检索;
关键词
document clustering; evaluation methodology; machine learning; document representation;

机译：文档聚类;评估方法;机器学习;文件表示;

相似文献

外文文献
中文文献
专利

1. DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering [J] . Lakshmi R., Baskar S. Journal of Information Science . 2019,第6期

机译：DIC-DOC-K-means：使用K-means的DOCument聚类基于不相似性的初始质心选择，以提高文本文档聚类的效率
2. An Approach to Improve Quality of Document Clustering by Word Set Based Documenting Clustering Algorithm [J] . Sandeep Sharma, Ruchi Dave, Naveen Hemrajani Oriental journal of computer science and technology . 2011,第2期

机译：基于词集的文档聚类算法提高文档聚类质量的方法
3. ACCF/AHA 2007 clinical expert consensus document on coronary artery calcium scoring by computed tomography in global cardiovascular risk assessment and in evaluation of patients with chest pain: a report of the American College of Cardiology Foundation Clinical Expert Consensus Task Force (ACCF/AHA Writing Committee to Update the 2000 Expert Consensus Document on Electron Beam Computed Tomography). Developed in Collaboration With the Society of Atherosclerosis Imaging and Prevention and the Society of Cardiovascular Computed Tomography [J] . Greenland P, Bonow RO, Brundage BH, Circulation: An Official Journal of the American Heart Association . 2007,第3期

机译：ACCF / AHA 2007年临床专家共识文件，关于通过计算机断层扫描在全球心血管风险评估和胸痛患者评估中对冠状动脉钙进行评分：美国心脏病学会基金会临床专家共识工作组（ACCF / AHA撰写委员会的报告）更新有关电子束CT的2000年专家共识文件）。与动脉粥样硬化影像学和预防学会以及心血管计算机断层摄影学会合作开发
4. Document Clustering with Committees [C] . Patrick Pantel, Dekang Lin The Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 11-15, 2002, Tampere, Finland . 2002

机译：与委员会的文件聚类
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Venous thromboembolism prophylaxis in the trauma intensive care unit: an American Association for the Surgery of Trauma Critical Care Committee Clinical Consensus Document [O] . Joseph F Rappold, Forest R Sheppard, Samuel P Carmichael II, 2021

机译：创伤强度护理单位的静脉血栓栓塞预防：美国创伤关键护理委员会手术协会临床共识文件
7. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation31This document was approved by the American College of Cardiology Board of Trustees in August 2001, the American Heart Association Science Advisory and Coordinating Committee in August 2001, and the European Society of Cardiology Board and Committee for Practice Guidelines and Policy Conferences in August 2001.32When citing this document, the American College of Cardiology, the American Heart Association, and the European Society of Cardiology would appreciate the following citation format: Fuster V, Rydén LE, Asinger RW, Cannom DS, Crijns HJ, Frye RL, Halperin JL, Kay GN, Klein WW, Lévy S, McNamara RL, Prystowsky EN, Wann LS, Wyse DG. ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation). J Am Coll Cardiol 2001;38:XX-XX.33This document is available on the World Wide Web sites of the American College of Cardiology (www.acc.org), the American Heart Association (www.americanheart.org), the European Society of Cardiology (www.escardio.org), and the North American Society of Pacing and Electrophysiology (www.naspe.org). Single reprints of this document (the complete Guidelines) to be published in the mid-October issue of the European Heart Journal are available by calling +44.207.424.4200 or +44.207.424.4389, faxing +44.207.424.4433, or writing Harcourt Publishers Ltd, European Heart Journal, ESC Guidelines – Reprints, 32 Jamestown Road, London, NW1 7BY, United Kingdom. Single reprints of the shorter version (Executive Summary and Summary of Recommendations) published in the October issue of the Journal of the American College of Cardiology and the October issue of Circulation, are available for $5.00 each by calling 800-253-4636 (US only) or by writing the Resource Center, American College of Cardiology, 9111 Old Georgetown Road, Bethesda, Maryland 20814. To purchase bulk reprints specify version and reprint number (Executive Summary 71-0208; full text 71-0209) up to 999 copies, call 800-611-6083 (US only) or fax 413-665-2671; 1000 or more copies, call 214-706-1466, fax 214-691-6342; or E-mail: pubauth@heart.org. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the European Society of Cardiology Committee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial Fibrillation) Developed in Collaboration With the North American Society of Pacing and Electrophysiology [O] . Fuster Valentin, Rydén Lars E., Asinger Richard W., 2001

机译：ACC / AHA / ESC治疗房颤患者指南31该文件于2001年8月获得美国心脏病学会董事会，2001年8月美国心脏协会科学咨询与协调委员会以及欧洲心脏病学会的批准以及实践指南和政策委员会会议（2001年8月）。32引用本文件时，美国心脏病学会，美国心脏协会和欧洲心脏病学会将赞赏以下引用格式：Fuster V，RydénLE，Asinger RW，Cannom DS，Crijns HJ，Frye RL，Halperin JL，Kay GN，Klein WW，LévyS，McNamara RL，Prystowsky EN，Wann LS，Wyse DG。 ACC / AHA / ESC治疗房颤患者的指南：美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南委员会和政策会议的报告（制定指南委员会）用于房颤患者的治疗）。 J Am Coll Cardiol 2001; 38：XX-XX.33本文件可在美国心脏病学会（www.acc.org），美国心脏协会（www.americanheart.org），欧洲的万维网站点上找到心脏病学会（www.escardio.org）和北美起搏和电生理学会（www.naspe.org）。可致电+44.207.424.4200或+44.207.424.4389，传真+44.207.424.4433或写信给Harcourt Publishers，以获取本文档（完整的准则）的单份重印本（完整的准则），该印刷本将于10月中旬出版。欧洲心脏杂志，ESC指南–转载，英国伦敦詹姆斯敦路32号，NW1 7BY。短版（执行摘要和建议摘要）的单版重印在《美国心脏病学会杂志》十月刊和《循环》十月刊上，致电800-253-4636（仅美国），每本售价5.00美元。）或写信给美国心脏病学院资源中心，地址是：马里兰州贝塞斯达市Old Georgetown Road 9111，邮编20814。要购买批量转载，请指定版本和转载编号（执行摘要71-0208；全文71-0209），最多999份，致电800-611-6083（仅限美国）或传真413-665-2671； 1000或更多副本，请致电214-706-1466，传真214-691-6342;或电子邮件：pubauth@heart.org。美国心脏病学会/美国心脏协会实践指南工作组和欧洲心脏病学会实践指南和政策会议（制定房颤患者治疗指南委员会）的报告是与北方合作开发的美国起搏与电生理学会

Document Clustering with Committees

摘要

著录项

相似文献

相关主题

期刊订阅