Clustering tagged documents with labeled and unlabeled documents

Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Chun-Hsien Chen

首页> 外文期刊>Information Processing & Management >Clustering tagged documents with labeled and unlabeled documents

【24h】

Clustering tagged documents with labeled and unlabeled documents

机译：将带有标签和未标签文档的标签文档聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.

机译：这项研究采用了我们提出的称为Constrained-PLSA的半监督聚类方法，将带标签的文档与少量带标签的文档进行聚类，并使用两个数据集进行系统性能评估。第一个数据集是文档集，其簇之间的边界不清晰；而第二个在群集之间具有清晰的边界。这项研究采用了论文摘要和用户注释的标签来对文档进行聚类。标签和单词的四种组合用于特征表示。实验结果表明，几乎所有方法都可以从标签中受益。但是，无监督学习方法无法在带有嘈杂信息的数据集中正常运行，但是Constrained-PLSA可以正常运行。在许多实际应用中，已经准备好了背景知识，可以在聚类过程中运用背景知识，从而使学习更加快速有效。

著录项

来源
《Information Processing & Management》 |2013年第3期|596-606|共11页
作者
Chien-Liang Liu; Wen-Hoar Hsaio; Chia-Hoang Lee; Chun-Hsien Chen;
展开▼
作者单位

Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC;

Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC;

Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC;

Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text mining; Document clustering; Semi-supervised clustering; Tagged document clustering;

机译：文本挖掘;文档聚类;半监督聚类;标记文档聚类;

相似文献

外文文献
中文文献
专利

1. Clustering documents with labeled and unlabeled documents using fuzzy semi-Kmeans [J] . Chien-Liang Liu, Tao-Hsing Chang, Hsuan-Hsun Li Fuzzy sets and systems . 2013,第juna16期

机译：使用模糊半均值将文档与带标签和未带标签的文档聚类
2. Text Classification from Labeled and Unlabeled Documents using EM [J] . KAMAL NIGAM, ANDREW KACHITES MCCALLUM, SEBASTIAN THRUN Machine Learning . 2000,第2a3期

机译：使用EM对标签和未标签文档进行文本分类
3. Fractional integral inequalities for generalized- m documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$mathbf{m }$$end{document} - ( ( h 1 p , h 2 q ) ? ( η 1 , η 2 ) ) documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$((h_{1}^{p},h_{2}^{q});(eta _{1},eta _{2}))$$end{document} -convex mappings via an extended generalized Mittag–Leffler function [J] . George Anastassiou, Artion Kashuri, Rozana Liko Arabian Journal of Mathematics . 2020,第2期

机译：广义 - <内联公式ID =“IEQ1”> <替代方案> m DocumentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ natm {document} - <替代方案> （（ h 1 p ， H 2 Q ）？（ η 1 ， η 2 ）） documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage {mathrsfs } usepackage {supmeek} setLength { oddsideDemargin} { - 69pt} begin {document} $$（（h_ {1} ^ {p}，h_ {2} ^ {q}）;（ eta _ {1 }， eta _ {2}））$$ end {document} -Convex通过扩展的广义式Mittag-Leffler功能映射
4. Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters [C] . Xiaoxiao Li, Jiyang Chen, Osmar Zaiane Pacific-Asia conference on knowledge discovery and data mining . 2013

机译：文本文档主题递归聚类和文档聚类层次结构的自动标记
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals [O] . Hamed Hassanzadeh, Mahnoosh Kholghi, Anthony Nguyen, 2018

机译：跨医院使用标记和未标记数据的临床文件分类
7. Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters [O] . Xiaoxiao Li, Jiyang Chen, Osmar Zaiane 2013

机译：文本文档主题递归聚类和文档聚类层次结构的自动标记
8. Using EM to Classify Text from Labeled and Unlabeled Documents [R] . Nigam, K. , McCallum, A. , Thrun, S. , 1998

机译：使用Em从标记和未标记文档中分类文本

Clustering tagged documents with labeled and unlabeled documents

摘要

著录项

相似文献

相关主题

期刊订阅