首页> 外文期刊>IEICE transactions on information and systems >Improving Text Categorization with Semantic Knowledge in Wikipedia
【24h】

Improving Text Categorization with Semantic Knowledge in Wikipedia

机译:通过Wikipedia中的语义知识改善文本分类

获取原文
           

摘要

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.
机译:文本分类,尤其是短文本分类,是困难且具有挑战性的任务,因为文本数据是稀疏和多维的。在传统的文本分类方法中,文档文本使用“单词袋(BOW)”文本表示模式表示,该模式基于单词共现并且有很多局限性。在本文中,我们将文档文本映射到Wikipedia概念,并使用基于Wikipedia概念的文档表示方法代替了传统的BOW模型进行文本分类。为了克服忽略文档表示模型中术语之间的语义关系并利用Wikipedia中丰富的语义知识的缺点,我们构造了一个语义矩阵来丰富基于Wikipedia概念的文档表示。对长短文本的五个真实数据集进行的实验评估表明,我们的方法优于传统的BOW方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号