Improving Text Categorization with Semantic Knowledge in Wikipedia

Xiang WANG; Yan JIA; Ruhua CHEN; Hua FAN; Bin ZHOU

首页> 外文期刊>IEICE transactions on information and systems >Improving Text Categorization with Semantic Knowledge in Wikipedia

【24h】

Improving Text Categorization with Semantic Knowledge in Wikipedia

机译：通过Wikipedia中的语义知识改善文本分类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.

机译：文本分类，尤其是短文本分类，是困难且具有挑战性的任务，因为文本数据是稀疏和多维的。在传统的文本分类方法中，文档文本使用“单词袋（BOW）”文本表示模式表示，该模式基于单词共现并且有很多局限性。在本文中，我们将文档文本映射到Wikipedia概念，并使用基于Wikipedia概念的文档表示方法代替了传统的BOW模型进行文本分类。为了克服忽略文档表示模型中术语之间的语义关系并利用Wikipedia中丰富的语义知识的缺点，我们构造了一个语义矩阵来丰富基于Wikipedia概念的文档表示。对长短文本的五个真实数据集进行的实验评估表明，我们的方法优于传统的BOW方法。

著录项

来源
《IEICE transactions on information and systems》 |2013年第12期|共9页
作者
Xiang WANG; Yan JIA; Ruhua CHEN; Hua FAN; Bin ZHOU;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving Text Categorization with Semantic Knowledge in Wikipedia [J] . Xiang WANG, Yan JIA, Ruhua CHEN, IEICE Transactions on Information and Systems . 2013,第12期

机译：通过Wikipedia中的语义知识改善文本分类
2. Text Matching and Categorization: Mining Implicit Semantic Knowledge from Tree-Shape Structures [J] . Guo Lin, Zuo Wanli, Peng Tao, Mathematical Problems in Engineering . 2015,第PTa18期

机译：文本匹配和分类：从树形结构中挖掘隐式语义知识
3. Taking advantage of improved resource allocating network and latent semantic feature selection approach for automated text categorization [J] . Wei Song, Jiu Zhen Liang, Xiao Liang He, Applied Soft Computing . 2014,第Null期

机译：利用改进的资源分配网络和潜在语义特征选择方法进行自动文本分类
4. Wikipedia in Action: Ontological Knowledge in Text Categorization [C] . Maciej Janik, Krys J. Kochut IEEE International Conference on Semantic Computing . 2008

机译：维基百科在行动：文本分类中的本体论知识
5. Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles. [D] . Zendejas, Ignacio. 2014

机译：使用Wikipedia和语义用户配置文件在短文本中提取和消除歧义。
6. Knowledge categorization affects popularity and quality of Wikipedia articles [O] . Jürgen Lerner, Alessandro Lomi -1

机译：知识分类会影响维基百科文章的受欢迎程度和质量
7. Text Matching and Categorization: Mining Implicit Semantic Knowledge from Tree-Shape Structures [O] . Lin Guo, Wanli Zuo, Tao Peng, 2015

机译：文本匹配和分类：从树形结构中挖掘隐式语义知识

Improving Text Categorization with Semantic Knowledge in Wikipedia

摘要

著录项

相似文献

相关主题

期刊订阅