首页> 外文期刊>Computer and information science >Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification
【24h】

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

机译:孟加拉语中的主题建模:优化主题和新闻分类的LDA方法

获取原文
获取原文并翻译 | 示例
           

摘要

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.
机译:主题建模是对大型文档集进行无监督分析的强大技术。主题模型具有广泛的应用,包括标签推荐,文本分类,关键词提取以及文本挖掘,信息检索和统计语言建模中的相似性搜索。主题建模的研究日益普及。有多种有效的主题建模技术可用于英语,因为英语是世界上使用最多的语言之一,而其他语言则不然。孟加拉语是世界上人口最多的第七种母语,它需要不同方面的自动化。本文探讨寻找孟加拉新闻语料库的核心主题,并通过相似性度量对新闻进行分类。使用带有bigram的LDA(潜在Dirichlet分配)构建文档模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号