...
首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora
【24h】

A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

机译:关于医疗和健康文本语料库主题建模的新型模糊k型潜在语义分析(FKLSA)方法

获取原文
获取原文并翻译 | 示例
           

摘要

Medical and health text documents pose a challenge for data handling and retrieving the relevant and meaningful documents. Automatically retrieval of significant knowledge with a better understanding of medical and health documents is a challenging task. One popular approach for thematically understand the medical and health text documents and finding the topics from these documents is topic modeling. In this research, we propose a novel topic modeling approach Fuzzy k-means latent semantic analysis (FKLSA) by using the fuzzy clustering. Our method generates local and global term frequencies through the bag of words (BOW) model. Principal component analysis is used for removing high dimensionality negative impact on global term weighting. Previous work shows that in medical and health documents redundancy issue has a negative impact on the quality of text mining Therefore, the main achievement of FKLSA is the handling of the redundancy issue in medical and text documents and discover semantically more precise topics. FKLSA is socially utilized for finding the themes from medical and health text corpus. These topics are further used for text classification and clustering tasks in text mining Experimental results show that FKLSA performs better than LDA and RedLDA for redundant corpora. FKLSA's time performance is also stable with an increase in number of topics and thus better than LDA and LSA on a big twitter heath dataset. Quantitative evaluations of the real-world dataset for health and medical documents show that FKLSA gives a higher performance as compared to state-of-the-art topic models like Latent Dirichlet allocation and Latent semantic analysis.
机译:医疗和健康文本文件对数据处理和检索相关和有意义的文件构成挑战。通过更好地理解医疗和健康文件自动检索重要知识是一项具有挑战性的任务。一种流行的方法,用于主题了解医疗和健康文本文件,并从这些文件中找到主题是主题建模。在这项研究中,我们提出了一种新颖的主题建模方法模糊K-Means潜在语义分析(FKLSA)通过使用模糊聚类。我们的方法通过单词(弓)模型产生本地和全局术语频率。主成分分析用于去除对全局术语加权的高维度负面影响。以前的工作表明,在医疗和健康文件中,冗余问题对文本挖掘质量产生负面影响,因此FKLSA的主要成就是在医疗和文本文件中处理冗余问题,并发现语义更精确的主题。 FKLSA社会用于寻找来自医疗和健康文本语料库的主题。这些主题进一步用于文本挖掘实验结果中的文本分类和聚类任务表明,FKLSA比LDA和Redly Gly更好地表现更好。 FKLSA的时间性能也稳定,主题数量增加,从而优于LDA和LSA在大Twitter Heath DataSet上。健康和医疗文献的现实数据集的定量评估表明,与潜在的Dirichlet分配和潜在语义分析相比,FKLSA与最先进的主题模型相比提供了更高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号