首页> 外文会议>International conference on world wide web >Comment-based Multi-View Clustering of Web 2.0 Items
【24h】

Comment-based Multi-View Clustering of Web 2.0 Items

机译:Web 2.0项目基于注释的多视图群集

获取原文

摘要

Clustering Web 2.0 items (i.e., web resources like videos, images) into semantic groups benefits many applications, such as organizing items, generating meaningful tags and improving web search. In this paper, we systematically investigate how user-generated comments can be used to improve the clustering of Web 2.0 items. In our preliminary study of Last.fm, we find that the two data sources extracted from user comments - the textual comments and the commenting users - provide complementary evidence to the items' intrinsic features. These sources have varying levels of quality, but we importantly we find that incorporating all three sources improves clustering. To accommodate such quality imbalance, we invoke multi-view clustering, in which each data source represents a view, aiming to best leverage the utility of different views. To combine multiple views under a principled framework, we propose CoNMF (Co-regularized Non-negative Matrix Factorization), which extends NMF for multi-view clustering by jointly fac-torizing the multiple matrices through co-regularization. Under our CoNMF framework, we devise two paradigms - pair-wise CoNMF and cluster-wise CoNMF - and propose iterative algorithms for their joint factorization. Experimental results on Last.fm and Yelp datasets demonstrate the effectiveness of our solution. In Last.fm, CoNMF betters k-means with a statistically significant F_1 increase of 14%, while achieving comparable performance with the state-of-the-art multi-view clustering method CoSC [24]. On a Yelp dataset, CoNMF outperforms the best baseline CoSC with a statistically significant performance gain of 7%.
机译:将Web 2.0项(即视频,图像之类的Web资源)聚类为语义组会有益于许多应用程序,例如组织项,生成有意义的标签和改进Web搜索。在本文中,我们系统地研究了如何使用用户生成的注释来改进Web 2.0项目的聚类。在对Last.fm的初步研究中,我们发现从用户评论中提取的两个数据源(文本评论和评论用户)为项目的内在特征提供了补充证据。这些来源的质量水平各不相同,但重要的是,我们发现合并所有这三个来源可改善聚类。为了解决这种质量不平衡问题,我们调用了多视图聚类,其中每个数据源都代表一个视图,旨在最大程度地利用不同视图的效用。为了在一个有原则的框架下合并多个视图,我们提出了CoNMF(共正则化非负矩阵分解),它通过共同正则化共同对多个矩阵进行扩展,从而将NMF扩展为多视图聚类。在我们的CoNMF框架下,我们设计了两个范式-逐对CoNMF和聚类CoNMF-并提出了用于联合分解的迭代算法。在Last.fm和Yelp数据集上的实验结果证明了我们解决方案的有效性。在Last.fm中,CoNMF以具有统计意义的F_1增加14%改善了k均值,同时与最新的多视图聚类方法CoSC取得了可比的性能[24]。在Yelp数据集上,CoNMF优于最佳基准CoSC,具有统计上显着的7%的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号