首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Improving Domain Dictionary-Based Text Categorization Using Self-Partition Model
【24h】

Improving Domain Dictionary-Based Text Categorization Using Self-Partition Model

机译:使用自分区模型改进基于域词典的文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a novel model for improving the performance of Domain Dictionary-based text categorization. The proposed model is named as Self-Partition Model (SPM). SPM can group the candidate words into the predefined clusters, which are generated according to the structure of Domain Dictionary. Using these learned clusters as features, we proposed a novel text representation. The experimental results show that the proposed text representation-based text categorization system performs better than the Domain Dictionary-based text categorization system. It also performs better than the system based on Bag-of-Words when the number of features is small and the training corpus size is small.
机译:在本文中,我们提出了一种新型的模型,用于提高基于域词典的文本分类的性能。提出的模型称为自分区模型(SPM)。 SPM可以将候选单词分组到预定义的群集中,这些群集是根据域词典的结构生成的。利用这些学习的群集作为特征,我们提出了一种新颖的文本表示形式。实验结果表明,所提出的基于文本表示的文本分类系统的性能优于基于域词典的文本分类系统。当特征数量少且训练语料库大小较小时,它也比基于词袋的系统更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号