...
首页> 外文期刊>Journal of Advanced Computatioanl Intelligence and Intelligent Informatics >Topic Tracking Based on Identifying Proper Number of the Latent Topics in Documents
【24h】

Topic Tracking Based on Identifying Proper Number of the Latent Topics in Documents

机译:基于识别文档中潜在主题的正确数量的主题跟踪

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a method for detecting and tracking topics of newspaper articles based on the latent semantics of the documents. We use Latent Dirichlet Allocation (LDA) to extract latent topics. In using LDA, we have to provide the number of latent topics in target documents in advance. To do so, perplexity is widely used as a metric for estimating the number of latent topics in documents. As a solution, we estimate the number of latent topics without any prior information in the case of using Hierarchical Dirichlet Process LDA (HDP-LDA). We propose a method to estimate the number of latent topics in target documents based on calculating the similarity among extracted topics, and conduct an experiment with three data sets to compare the method with the above two representative methods, i.e., HDP-LDA and LDA using perplexity. From experimental results, we confirmed that our method can provide results similar to that of HDP-LDA. We also detect and track topics by means of our proposed method and confirm that our method is useful.
机译:本文提出了一种基于文档潜在语义的报纸文章主题检测与跟踪方法。我们使用潜在狄利克雷分配(LDA)提取潜在主题。在使用LDA时,我们必须提前在目标文档中提供潜在主题的数量。为此,困惑度被广泛用作估计文档中潜在主题数量的度量。作为解决方案,在使用分层Dirichlet过程LDA(HDP-LDA)的情况下,我们估计了没有任何先验信息的潜在主题的数量。我们提出了一种基于计算提取的主题之间的相似度来估计目标文档中潜在主题数量的方法,并使用三个数据集进行了实验,以将该方法与上述两种代表性方法(即HDP-LDA和LDA)进行比较困惑。从实验结果,我们证实了我们的方法可以提供与HDP-LDA相似的结果。我们还通过我们提出的方法来检测和跟踪主题,并确认我们的方法是有用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号