首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Locally Consistent Concept Factorization for Document Clustering
【24h】

Locally Consistent Concept Factorization for Document Clustering

机译:用于文档聚类的局部一致概念分解

获取原文
获取原文并翻译 | 示例
           

摘要

Previous studies have demonstrated that document clustering performance can be improved significantly in lower dimensional linear subspaces. Recently, matrix factorization-based techniques, such as Nonnegative Matrix Factorization (NMF) and Concept Factorization (CF), have yielded impressive results. However, both of them effectively see only the global euclidean geometry, whereas the local manifold geometry is not fully considered. In this paper, we propose a new approach to extract the document concepts which are consistent with the manifold geometry such that each concept corresponds to a connected component. Central to our approach is a graph model which captures the local geometry of the document submanifold. Thus, we call it Locally Consistent Concept Factorization (LCCF). By using the graph Laplacian to smooth the document-to-concept mapping, LCCF can extract concepts with respect to the intrinsic manifold structure and thus documents associated with the same concept can be well clustered. The experimental results on TDT2 and Reuters-21578 have shown that the proposed approach provides a better representation and achieves better clustering results in terms of accuracy and mutual information.
机译:先前的研究表明,文档聚类性能可以在低维线性子空间中得到显着改善。最近,基于矩阵分解的技术,例如非负矩阵分解(NMF)和概念分解(CF),已经产生了令人印象深刻的结果。但是,他们两个都只能有效地看到整体欧几里得几何形状,而局部流形几何形状并未得到充分考虑。在本文中,我们提出了一种新的方法来提取与流形几何形状一致的文档概念,以使每个概念都对应于一个连接的组件。我们的方法的中心是一个图形模型,它捕获了文档子流形的局部几何形状。因此,我们称其为本地一致性概念分解(LCCF)。通过使用图拉普拉斯算子来平滑文档到概念的映射,LCCF可以提取关于固有流形结构的概念,因此可以很好地聚类与同一概念相关的文档。在TDT2和Reuters-21578上的实验结果表明,该方法在准确性和互信息方面提供了更好的表示并获得了更好的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号