Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures

Wei Song; Cheng Hua Li; Soon Cheol Park

首页> 外文期刊>Expert systems with applications >Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures

【24h】

Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures

机译：基于本体的遗传聚类遗传算法并评估各种语义相似度的有效性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments.

机译：提出了一种基于本体方法的自组织遗传算法用于文本聚类。文本聚类领域的常见问题是文档以一袋单词表示，而概念上的相似性被忽略。我们利用基于同义词库和基于语料库的本体来克服此问题。但是，传统的基于语料库的方法很难解决。提出了一种可以适当捕获相关语义相似性的转换后潜在语义索引（LSI）模型，并将其作为基于语料库的本体进行了演示。为了研究如何在文本聚类中有效使用本体方法，实现了两种使用各种相似性度量的混合策略。实验结果表明，我们的遗传算法与本体策略相结合，将基于LSI的转换后的度量与基于词库的度量相结合，明显优于传统的相似度度量。与相同相似环境中的标准GA和k-means相比，我们的聚类算法还可以有效地提高性能。

著录项

来源
《Expert systems with applications》 |2009年第5期|9095-9104|共10页
作者
Wei Song; Cheng Hua Li; Soon Cheol Park;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
genetic algorithm; text clustering; ontology; wordnet; latent semantic indexing;

机译：遗传算法文本聚类本体词网潜在语义索引;

相似文献

外文文献
中文文献
专利

1. A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE [J] . Haoxiang XIA, Shuguang WANG, Taketoshi YOSHIDA 系统科学与系统工程学报（英文版） . 2006,第004期

机译：具有语义相似性度量的基于蚁群的改进文本聚类算法
2. A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE [J] . Haoriang XIA, Shuguang WANG, Taketoshi YOSHIDA Journal of systems science and systems engineering . 2006,第4期

机译：具有语义相似性度量的基于蚁群的改进文本聚类算法
3. A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE [J] . Haoriang XIA, Shuguang WANG, Taketoshi YOSHIDA Journal of systems science and systems engineering . 2006,第4期

机译：具有语义相似性度量的基于蚁群的改进文本聚类算法
4. Self-adaptive GA,Quantitative Semantic Similarity Measures and Ontology-based Text Clustering [C] . Chengzhi ZHANG, Wei SONG, Chenghua LI, The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering（IEEE NLP-KE 2008）(2008IEEE自然语言处理与知识工程国际会议)论文集 . 2008

机译：自适应遗传算法，定量语义相似性度量和基于本体的文本聚类
5. Using semantic similarity measures in the biomedical domain for computing functional similarity between genes based on gene ontology [D] . Khabiri, Elham 2007

机译：在生物医学领域中使用语义相似性度量基于基因本体计算基因之间的功能相似性
6. GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms [O] . Chenguang Zhao, Zheng Wang -1

机译：GOGO：一种改进的算法用于测量基因本体术语之间的语义相似性
7. Self-adaptive GA, quantitative semantic similarity measures and ontology-based text clustering [O] . Zhang Chengzhi, Song Wei, Li Chenghua, 2008

机译：自适应遗传算法，定量语义相似性度量和基于本体的文本聚类

Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures

摘要

著录项

相似文献

相关主题

期刊订阅