...
首页> 外文期刊>Expert systems with applications >Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures
【24h】

Genetic Algorithm For Text Clustering Using Ontology And Evaluating The Validity Of Various Semantic Similarity Measures

机译:基于本体的遗传聚类遗传算法并评估各种语义相似度的有效性

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments.
机译:提出了一种基于本体方法的自组织遗传算法用于文本聚类。文本聚类领域的常见问题是文档以一袋单词表示,而概念上的相似性被忽略。我们利用基于同义词库和基于语料库的本体来克服此问题。但是,传统的基于语料库的方法很难解决。提出了一种可以适当捕获相关语义相似性的转换后潜在语义索引(LSI)模型,并将其作为基于语料库的本体进行了演示。为了研究如何在文本聚类中有效使用本体方法,实现了两种使用各种相似性度量的混合策略。实验结果表明,我们的遗传算法与本体策略相结合,将基于LSI的转换后的度量与基于词库的度量相结合,明显优于传统的相似度度量。与相同相似环境中的标准GA和k-means相比,我们的聚类算法还可以有效地提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号