A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

Rashid Junaid; Shah Syed Muhammad Adnan; Irtaza Aun

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

【24h】

A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

机译：关于医疗和健康文本语料库主题建模的新型模糊k型潜在语义分析（FKLSA）方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Medical and health text documents pose a challenge for data handling and retrieving the relevant and meaningful documents. Automatically retrieval of significant knowledge with a better understanding of medical and health documents is a challenging task. One popular approach for thematically understand the medical and health text documents and finding the topics from these documents is topic modeling. In this research, we propose a novel topic modeling approach Fuzzy k-means latent semantic analysis (FKLSA) by using the fuzzy clustering. Our method generates local and global term frequencies through the bag of words (BOW) model. Principal component analysis is used for removing high dimensionality negative impact on global term weighting. Previous work shows that in medical and health documents redundancy issue has a negative impact on the quality of text mining Therefore, the main achievement of FKLSA is the handling of the redundancy issue in medical and text documents and discover semantically more precise topics. FKLSA is socially utilized for finding the themes from medical and health text corpus. These topics are further used for text classification and clustering tasks in text mining Experimental results show that FKLSA performs better than LDA and RedLDA for redundant corpora. FKLSA's time performance is also stable with an increase in number of topics and thus better than LDA and LSA on a big twitter heath dataset. Quantitative evaluations of the real-world dataset for health and medical documents show that FKLSA gives a higher performance as compared to state-of-the-art topic models like Latent Dirichlet allocation and Latent semantic analysis.

机译：医疗和健康文本文件对数据处理和检索相关和有意义的文件构成挑战。通过更好地理解医疗和健康文件自动检索重要知识是一项具有挑战性的任务。一种流行的方法，用于主题了解医疗和健康文本文件，并从这些文件中找到主题是主题建模。在这项研究中，我们提出了一种新颖的主题建模方法模糊K-Means潜在语义分析（FKLSA）通过使用模糊聚类。我们的方法通过单词（弓）模型产生本地和全局术语频率。主成分分析用于去除对全局术语加权的高维度负面影响。以前的工作表明，在医疗和健康文件中，冗余问题对文本挖掘质量产生负面影响，因此FKLSA的主要成就是在医疗和文本文件中处理冗余问题，并发现语义更精确的主题。 FKLSA社会用于寻找来自医疗和健康文本语料库的主题。这些主题进一步用于文本挖掘实验结果中的文本分类和聚类任务表明，FKLSA比LDA和Redly Gly更好地表现更好。 FKLSA的时间性能也稳定，主题数量增加，从而优于LDA和LSA在大Twitter Heath DataSet上。健康和医疗文献的现实数据集的定量评估表明，与潜在的Dirichlet分配和潜在语义分析相比，FKLSA与最先进的主题模型相比提供了更高的性能。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2019年第2期|共16页
作者
Rashid Junaid; Shah Syed Muhammad Adnan; Irtaza Aun;
展开▼
作者单位

Univ Engn &

Technol Dept Comp Sci Taxila Pakistan;

Univ Engn &

Technol Dept Comp Sci Taxila Pakistan;

Univ Engn &

Technol Dept Comp Sci Taxila Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Topic modeling; bag-of-words; term weighting; fuzzy k-means; principal component analysis;

机译：主题建模;文字袋;术语加权;模糊k均值;主成分分析;

相似文献

外文文献
中文文献
专利

1. 基于主题概念空间的文本模糊c-均值聚类方法 [J] . 吉翔华, 陈超, 邵正荣, 东南大学学报（英文版） . 2007,第003期
2. A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora [J] . Rashid Junaid, Shah Syed Muhammad Adnan, Irtaza Aun Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第5aPta2期

机译：关于医疗和健康文本语料库主题建模的新型模糊k型潜在语义分析（FKLSA）方法
3. A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data [J] . Vasantha Kumari Garbhapu, Prajna Bodapati Indian Journal of Science and Technology . 2020,第44期

机译：潜在语义分析与潜在的Dirichlet分配主题建模方法的比较分析
4. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
5. Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis [C] . Siti Qomariyah, Nur Iriawan, Kartika Fithriasari International Conference on Science, Mathematics, Environment, and Education . 2019

机译：主题使用潜在Dirichlet分配和潜在语义分析建模Twitter数据
6. Modeling social systems processes found in *text corpora through windowed latent semantic analysis and simulation of concept refreshment events. [D] . Weaver, Christopher Adrian. 2005

机译：通过窗口式潜在语义分析和概念更新事件的模拟，对*文本语料库中发现的社会系统过程进行建模。
7. A systematic study on latent semantic analysis model parameters for mining biomedical literature [O] . Mohammed Yeasin, Haritha Malempati, Ramin Homayouni, 2009

机译：挖掘生物医学文献潜在语义分析模型参数的系统研究
8. Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering [O] . Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, 2019

机译：通过混合逆文档频率和模糊k叶片频率和模糊k型群体挖掘生物医学文本语料主题建模技术

A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

摘要

著录项

相似文献

相关主题

期刊订阅