The Benefit of Document Embedding in Unsupervised Document Classification

机译：无监督文件分类中文件嵌入的好处

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of this article is to show that the document embedding using the doc2vec algorithm can substantially improve the performance of the standard method for unsupervised document classification - the K-means clustering. We have performed rather extensive set of experiments on one English and two Czech datasets and the results suggest that representing the documents using vectors generated by the doc2vec algorithm brings a consistent improvement across languages and datasets. The English dataset - 20NewsGroups - was processed in a way that allows direct comparison with the results of both supervised and unsupervised algorithms published previously. Such comparison is provided in the paper, together with the results of supervised classification achieved by the state-of-the-art SVM classifier.

机译：本文的目的是表明使用doc2vec算法嵌入文档可以显着提高无监督文档分类的标准方法-K-means聚类的性能。我们对一个英语和两个捷克数据集进行了相当广泛的实验，结果表明，使用由doc2vec算法生成的矢量表示文档可对语言和数据集带来一致的改进。英语数据集-20NewsGroups-的处理方式可以直接与之前发布的监督和非监督算法的结果进行比较。本文提供了这种比较，以及最新的SVM分类器实现的监督分类结果。

著录项

来源
《International Conference on speech and computer》|2018年|470-478|共9页
会议地点
作者
Jaromir Novotny; Pavel Ircing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Document embedding; Doc2vec; Classification K-means; SVM;

机译：文件嵌入; Doc2vec;分类K均值;支持向量机;

相似文献

外文文献
中文文献
专利

1. An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings [J] . Lamsiyah Salima, El Mahdaouy Abdelkader, Espinasse Bernard, Expert systems with applications . 2021,第Apra期

机译：基于质心方法和句子嵌入的提取多文件摘要的无监督方法
2. AN UNSUPERVISED CLASSIFICATION TECHNIQUE FOR RECOGNITION OF SCRATCHED AND NON-SCRATCHED WORDS IN PRE-PRINTED DOCUMENTS [J] . N. SHOBHA RANI, VASUDEV T, VINEETH .P, Journal of Theoretical and Applied Information Technology . 2016,第2期

机译：一种用于预先印刷文档中已划痕和未划痕单词识别的未经监督的分类技术
3. An Unsupervised Classification Technique for Detection of Flipped Orientations in Document Images [J] . International Journal of Electrical and Computer Engineering . 2016,第5期

机译：用于检测文档图像中翻转方向的无监督分类技术
4. The Benefit of Document Embedding in Unsupervised Document Classification [C] . Jaromir Novotny, Pavel Ircing International Conference on Speech and Computer . 2018

机译：嵌入文件嵌入无监督文件分类的好处
5. Unsupervised classification of text documents. [D] . Aparicio Carrasco, Roxana K. 2008

机译：文本文件的无监督分类。
6. Unsupervised Machine Learning of Topics Documented by Nurses about Hospitalized Patients Prior to a Rapid-Response Event [O] . Zfania Tom Korach, Kenrick D. Cato, Sarah A. Collins, 2019

机译：在快速响应事件之前护士记录的主题的无监督机器学习
7. Unsupervised Attention Embedding for Document Clustering [O] . Ji-kang NIE, Zhi-guo ZHANG 2019

机译：嵌入文档聚类的无人监督
8. Unsupervised Non-topical Classification of Documents. [R] . Bekkerman, R., Eguchi, K., Allan, J. 2006

机译：无监督的非主题文件分类。

The Benefit of Document Embedding in Unsupervised Document Classification

摘要

著录项

相似文献

相关主题

期刊订阅