Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

Monalisa Ghosh; Goutam Sanyal

首页> 外文期刊>Applied computational intelligence and soft computing >Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

【24h】

Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

机译：基于情感特征集成特征选择的多分类器性能评估

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.

机译：情感分类或情感分析已被公认为开放研究领域。近年来，通过应用各种方法，在这些领域中进行了大量的研究工作。因此，特征生成和选择对于文本挖掘而言是必要的，因为高维特征集会影响情感分析的性能。本文研究了在四种机器学习分类算法（MNB，SVM，KNN和ME）上使用具有unigram和bigram特征集的广泛使用的特征选择方法（IG，卡方和基尼系数）的无能或无能。在三个标准数据集的基础上对提出的方法进行了评估，即IMDb电影评论以及电子和厨房产品评论数据集。最初，通过应用n-gram方法提取unigram和bigram特征。此外，我们生成了一个合成特征向量CompUniBi（unigram + bigram），将其发送到特征选择方法Information Gain（IG），Gini Index（GI）和Chi-square（CHI），以通过为每个功能分配分数。这些方法根据功能的得分对功能进行排名。因此，可以轻松生成突出的特征向量（CompIG，CompGI和CompCHI）以进行分类。最后，机器学习分类器SVM，MNB，KNN和ME使用突出的特征向量将评论文档分类为肯定或否定。算法的性能是通过评估方法（例如精度，召回率和F量度）来衡量的。实验结果表明，复合特征向量的性能优于unigram特征，这是令人鼓舞的，并且与相关研究具有可比性。从信息增益与SVM的组合中可以得到最高的准确性。

著录项

来源
《Applied computational intelligence and soft computing》 |2018年第2018期|8909357.1-8909357.12|共12页
作者
Monalisa Ghosh; Goutam Sanyal;
展开▼
作者单位

Department of Computer Science and Engineering National Institute of Technology, Durgapur, West Bengal, India;

Department of Computer Science and Engineering National Institute of Technology, Durgapur, West Bengal, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis [J] . Monalisa Ghosh, Goutam Sanyal Applied computational intelligence and soft computing . 2018,第1期

机译：基于集合特征选择方案的情感分类多分类器性能评估
2. Combining B&B-based hybrid feature selection and the imbalance-oriented multiple-classifier ensemble for imbalanced credit risk assessment [J] . Sun Jie, Lee Young-Chan, Li Hui, Technological and Economic Development of Economy . 2015,第3期

机译：结合基于B＆B的混合特征选择和面向不平衡的多分类器集成，以进行不平衡的信用风险评估
3. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data [J] . Li Yijing, Guo Haixiang, Liu Xiao, Knowledge-Based Systems . 2016,第Feba15期

机译：基于多分类器系统和特征选择的自适应集成分类算法对多类不平衡数据进行分类
4. Sentiment Analysis on Movie Reviews Using Ensemble Features and Pearson Correlation Based Feature Selection [C] . Fachrul Rozy Saputra Rangkuti, M. Ali Fauzi, Yuita Arum Sari, International Conference on Sustainable Information Engineering and Technology . 2018

机译：基于集合特征和基于Pearson相关的特征选择的电影评论情感分析
5. Feature selection for sentiment analysis based on Content and Syntax models [D] . Duric, Adnan 2011

机译：基于内容和语法模型的情感分析特征选择
6. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles [O] . Xudong Zhao, Qing Jiao, Hangyu Li, 2020

机译：ECFS-DEA：基于整体分类器的特征选择用于表达谱上的差异表达分析
7. COMBINING BB-BASED HYBRID FEATURE SELECTION AND THE IMBALANCE-ORIENTED MULTIPLE-CLASSIFIER ENSEMBLE FOR IMBALANCED CREDIT RISK ASSESSMENT [O] . Jie SUN, Young-Chan LEE, Hui LI, 2015

机译：组合基于B＆B的混合特征选择和面向不平衡的多分类器集合，以实现不平衡的信用风险评估

Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

摘要

著录项

相似文献

相关主题

期刊订阅