首页> 外文会议>International Conference on speech and computer >The Benefit of Document Embedding in Unsupervised Document Classification
【24h】

The Benefit of Document Embedding in Unsupervised Document Classification

机译:无监督文件分类中文件嵌入的好处

获取原文

摘要

The aim of this article is to show that the document embedding using the doc2vec algorithm can substantially improve the performance of the standard method for unsupervised document classification - the K-means clustering. We have performed rather extensive set of experiments on one English and two Czech datasets and the results suggest that representing the documents using vectors generated by the doc2vec algorithm brings a consistent improvement across languages and datasets. The English dataset - 20NewsGroups - was processed in a way that allows direct comparison with the results of both supervised and unsupervised algorithms published previously. Such comparison is provided in the paper, together with the results of supervised classification achieved by the state-of-the-art SVM classifier.
机译:本文的目的是表明使用doc2vec算法嵌入文档可以显着提高无监督文档分类的标准方法-K-means聚类的性能。我们对一个英语和两个捷克数据集进行了相当广泛的实验,结果表明,使用由doc2vec算法生成的矢量表示文档可对语言和数据集带来一致的改进。英语数据集-20NewsGroups-的处理方式可以直接与之前发布的监督和非监督算法的结果进行比较。本文提供了这种比较,以及最新的SVM分类器实现的监督分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号