首页> 外文期刊>Applied computational intelligence and soft computing >Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis
【24h】

Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

机译:基于情感特征集成特征选择的多分类器性能评估

获取原文
获取原文并翻译 | 示例
           

摘要

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.
机译:情感分类或情感分析已被公认为开放研究领域。近年来,通过应用各种方法,在这些领域中进行了大量的研究工作。因此,特征生成和选择对于文本挖掘而言是必要的,因为高维特征集会影响情感分析的性能。本文研究了在四种机器学习分类算法(MNB,SVM,KNN和ME)上使用具有unigram和bigram特征集的广泛使用的特征选择方法(IG,卡方和基尼系数)的无能或无能。在三个标准数据集的基础上对提出的方法进行了评估,即IMDb电影评论以及电子和厨房产品评论数据集。最初,通过应用n-gram方法提取unigram和bigram特征。此外,我们生成了一个合成特征向量CompUniBi(unigram + bigram),将其发送到特征选择方法Information Gain(IG),Gini Index(GI)和Chi-square(CHI),以通过为每个功能分配分数。这些方法根据功能的得分对功能进行排名。因此,可以轻松生成突出的特征向量(CompIG,CompGI和CompCHI)以进行分类。最后,机器学习分类器SVM,MNB,KNN和ME使用突出的特征向量将评论文档分类为肯定或否定。算法的性能是通过评估方法(例如精度,召回率和F量度)来衡量的。实验结果表明,复合特征向量的性能优于unigram特征,这是令人鼓舞的,并且与相关研究具有可比性。从信息增益与SVM的组合中可以得到最高的准确性。

著录项

  • 来源
    《Applied computational intelligence and soft computing》 |2018年第2018期|8909357.1-8909357.12|共12页
  • 作者

    Monalisa Ghosh; Goutam Sanyal;

  • 作者单位

    Department of Computer Science and Engineering National Institute of Technology, Durgapur, West Bengal, India;

    Department of Computer Science and Engineering National Institute of Technology, Durgapur, West Bengal, India;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号