首页> 外文期刊>Information Processing & Management >Clustering tagged documents with labeled and unlabeled documents
【24h】

Clustering tagged documents with labeled and unlabeled documents

机译:将带有标签和未标签文档的标签文档聚类

获取原文
获取原文并翻译 | 示例
           

摘要

This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.
机译:这项研究采用了我们提出的称为Constrained-PLSA的半监督聚类方法,将带标签的文档与少量带标签的文档进行聚类,并使用两个数据集进行系统性能评估。第一个数据集是文档集,其簇之间的边界不清晰;而第二个在群集之间具有清晰的边界。这项研究采用了论文摘要和用户注释的标签来对文档进行聚类。标签和单词的四种组合用于特征表示。实验结果表明,几乎所有方法都可以从标签中受益。但是,无监督学习方法无法在带有嘈杂信息的数据集中正常运行,但是Constrained-PLSA可以正常运行。在许多实际应用中,已经准备好了背景知识,可以在聚类过程中运用背景知识,从而使学习更加快速有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号