首页> 外文会议>IEEE International Congress on Big Data >Sparsity adjusted information gain for feature selection in sentiment analysis
【24h】

Sparsity adjusted information gain for feature selection in sentiment analysis

机译:稀疏度调整的信息增益,用于情感分析中的特征选择

获取原文

摘要

The widespread use of social media and the internet are emerging trends that offer an additional interaction channel for companies to better understand customer sentiments about their brands and products. Sentiment analysis uses text data from social media such as customer comments and reviews, which has the nature of high dimensionality. Without selection, typically there are at least thousands of features (words or phrases) that can be extracted from a text corpus, among which there are many redundant or irrelevant features for sentiment classification task. Thus, it is critical to select a compact yet effective set of features to avoid the complex classifier design and slow running time of classification process. However, very few of existing metrics is able to improve efficacy of feature selection by addressing the issue of sparsity of feature matrix for text data, i.e., many features may appear only in a few documents. In this paper, an improved feature selection metric known as sparsity adjusted information gain (SAIG) is proposed, which modifies the conventional information gain metric and aims to adjust the feature ranking scores according to the sparsity of the feature vector. It is able to use less features to obtain a targeted performance level. The experiment results show that SAIG is able to improve the performance of sentiment classification.
机译:社交媒体和互联网的广泛使用是新兴趋势,这些趋势为公司提供了一个额外的交互渠道,使公司可以更好地了解客户对其品牌和产品的看法。情感分析使用来自社交媒体的文本数据,例如客户评论和评论,这些文本数据具有高维度的性质。如果不进行选择,通常至少有数千个可以从文本语料库中提取的特征(单词或短语),其中有许多用于情感分类任务的冗余或不相关的特征。因此,至关重要的是选择一个紧凑而有效的功能集,以避免复杂的分类器设计和缓慢的分类过程运行时间。但是,通过解决文本数据的特征矩阵稀疏性的问题,很少有现有的度量能够提高特征选择的效率,即,许多特征可能仅出现在少数几个文档中。在本文中,提出了一种改进的特征选择度量,称为稀疏调整信息增益(SAIG),它对常规信息增益度量进行了修改,旨在根据特征向量的稀疏性来调整特征等级得分。它能够使用较少的功能来获得目标性能水平。实验结果表明,SAIG能够提高情感分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号