...
首页> 外文期刊>Journal of Computers >On Clustering Algorithms: Applications in Word-Embedding Documents
【24h】

On Clustering Algorithms: Applications in Word-Embedding Documents

机译:聚类算法:在词嵌入文档中的应用

获取原文
           

摘要

In this paper, we study the effectiveness of classical literature clustering algorithms applied to free text documents. We analyze the effects of varying the parameters on their performance and which aspects directly influence in the results. We apply a word-embedding-based technique to represent the document's bag-of-words and therefore be able to compare and study how these algorithms performs in the task of clustering these documents. We use two metrics that captures different aspects of the partitions and analyze those algorithms on the light of it. One of the main findings of this work is that some clustering algorithms are able to have a partition that's up to 91% of the real partition, whilst other performs really poor for the same dataset. We also find limitations on these techniques when trying to cluster hard datasets.
机译:在本文中,我们研究了应用于自由文本文档的经典文献聚类算法的有效性。我们分析了改变参数对其性能的影响,以及哪些方面直接影响结果。我们应用基于词嵌入的技术来表示文档的词袋,因此能够比较和研究这些算法在聚类这些文档的任务中的性能。我们使用两个指标来捕获分区的不同方面,并根据分区分析这些算法。这项工作的主要发现之一是,某些聚类算法能够拥有高达实际分区91%的分区,而另一些算法对于同一数据集却表现不佳。当尝试对硬数据集进行聚类时,我们还会发现这些技术的局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号