首页> 外文期刊>Journal of Computers >An Improved LDA Model for Academic Document Analysis
【24h】

An Improved LDA Model for Academic Document Analysis

机译:用于文档分析的改进的LDA模型

获取原文
           

摘要

Electronic documents on the Internet are always generated with many kinds of side information. Although those massive kinds of information make the analysis become very difficult, models would fit and analyze data well if they could make full use of those kinds of side information. This paper, base on the study on probabilistic topic model, proposes a new improved LDA model which is suitable for analysis of academic document. Based on the modification of standard LDA model, this new improved LDA model could analyze documents with both authors and references. To evaluate the generalization capability, this paper compares the new model with standard LDA and DMR model using the widely used Rexa dataset. Experimental results show that the new model has a high capability of document clustering and topics extraction than standard LDA and its modifications. In addition, the new model outperforms DMR model in task of authors discriminant.
机译:互联网上的电子文档始终会生成许多附带信息。尽管这些大量的信息使分析变得非常困难,但如果模型可以充分利用这些辅助信息,则可以很好地拟合和分析数据。本文在对概率主题模型的研究基础上,提出了一种适用于学术文献分析的改进的LDA模型。基于对标准LDA模型的修改,该新的改进的LDA模型可以分析作者和参考文献。为了评估泛化能力,本文使用广泛使用的Rexa数据集将新模型与标准LDA和DMR模型进行了比较。实验结果表明,与标准的LDA及其修改相比,新模型具有更高的文档聚类和主题提取能力。此外,新模型在作者判别方面胜过DMR模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号