首页> 外文期刊>Statistics and computing >Regularized bi-directional co-clustering
【24h】

Regularized bi-directional co-clustering

机译:正则化双向共聚

获取原文
获取原文并翻译 | 示例
           

摘要

The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises-Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word-word semantic relationships and document-document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.
机译:在处理稀疏高维数据集时,已经证明,已被证明在处理稀疏高维数据集中的单面聚类,同时群集文档和单词。通过他们的性质,文本数据通常也是不平衡和方向的。最近,提出了Von Mises-Fisher(VMF)混合模型来处理不平衡数据,同时利用文本的定向性质。在本文中,我们提出了一种基于VMF模型的共聚类矩阵制定的一般共聚类框架。该配方导致文本共簇的灵活框架,可以轻松地包含单词语义关系和文档文件的相似性。相反,与现有方法相比,通常使用相似性的添加剂掺入,我们提出了一种双向乘法正则化,从而更好地封装了底层文本数据结构。关于各种现实世界文本数据集的广泛评估展示了我们提出的方法对基线和竞争方法的卓越性能,无论是在聚类结果和共簇主题连贯性方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号