首页> 外文会议>International symposium on intelligent data analysis >COBRAS: Interactive Clustering with Pairwise Queries
【24h】

COBRAS: Interactive Clustering with Pairwise Queries

机译:COBRAS:具有成对查询的交互式聚类

获取原文

摘要

Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Answering yes results in a must-link constraint, no in a cannot-link. Ideally, the user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. Such an interactive clustering process requires the clustering system to satisfy three requirements: (1) it should be able to present a reasonable (intermediate) clustering to the user at any time, (2) it should produce good clusterings given few queries, i.e. it should be query-efficient, and (3) it should bo time-efficient. We present COBRAS, an approach to clustering with pairwise constraints that satisfies these requirements. COBRAS constructs clusterings of super-instances, which are local regions in the data in which all instances are assumed to belong to the same cluster. By dynamically refining these super-instances during clustering, COBRAS is able to produce clusterings at increasingly fine-grained levels of granularity. It quickly produces good high-level clusterings, and is able to refine them to find more detailed structure as more queries are answered. In our experiments we demonstrate that COBRAS is the only method able to produce good solutions at all stages of the clustering process at fast runtimes, and hence the most suitable method for interactive clustering.
机译:基于约束的聚类算法利用背景知识来构建符合特定用户兴趣的聚类。通常通过允许集群系统向用户提出成对查询来获得这种背景知识:这两个元素是否应该在同一个集群中?回答是会导致必须链接约束,否会导致无法链接约束。理想情况下,用户应该能够回答其中的几个查询,检查结果聚类并重复这两个步骤,直到获得满意的结果。这种交互式集群过程要求集群系统满足三个要求:(1)它应该能够随时向用户提供合理的(中间)集群;(2)在很少查询的情况下,它应该产生良好的集群,即应该是查询有效的,并且(3)应该是节省时间的。我们提出了COBRAS,一种通过成对约束进行聚类的方法,可以满足这些要求。 COBRAS构造超实例的群集,超实例是数据中假定所有实例都属于同一群集的局部区域。通过在聚类期间动态细化这些超级实例,COBRAS能够以越来越细的粒度级别生成聚类。它可以快速生成良好的高层聚类,并且能够随着对更多查询的回答而对其进行细化,以找到更详细的结构。在我们的实验中,我们证明了COBRAS是能够在快速运行时的聚类过程的所有阶段产生良好解决方案的唯一方法,因此是交互式聚类的最合适方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号