首页> 外文期刊>International journal of software innovation >A novel approach for ontology-based dimensionality reduction for web text document classification
【24h】

A novel approach for ontology-based dimensionality reduction for web text document classification

机译:基于本体的Web文本文档分类降维的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.
机译:特征向量尺寸的降维在增强文本处理能力方面起着至关重要的作用。它旨在减少挖掘任务(分类,聚类等)中使用的特征向量的大小。本文提出了一种有效的方法,可用于减少Web文本文档分类过程中特征向量的大小。这种方法基于使用WordNet本体,并利用其层次结构的优势,从生成的特征向量中消除与WordNet词汇类别无关的单词;这导致特征向量大小的减小而不会丢失文本信息。对于挖掘任务,矢量空间模型(VSM)用于表示文本文档,术语频率反文档频率(TFIDF)作为术语加权方法。使用几个实验,对照主成分分析(PCA)方法对基于本体的方法进行了评估。实验结果表明,作者提出的方法相对于其他传统方法的有效性,以实现更好的分类精度F值,精度和查全率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号