首页> 外文会议>IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies >Evaluating Feature Sets and Classifiers for Sentiment Analysis of Financial News
【24h】

Evaluating Feature Sets and Classifiers for Sentiment Analysis of Financial News

机译:评估金融新闻情绪分析的特征集和分类器

获取原文

摘要

Work on sentiment analysis has thus far been limited in the news article domain. This has mainly been caused by 1) news articles lacking a clearly defined target, 2) the difficulty in separating good and bad news from positive and negative sentiment, and 3) the seeming necessity of, and complexity in, relying on domain-specific interpretations and background knowledge. In this paper we propose, define, experiment with, and evaluate, four different feature categories, composed of 26 article features, for sentiment analysis. Using five different machine learning methods, we train sentiment classifiers of Norwegian financial internet news articles, and achieve classification precisions up to ~71%. This is comparable to the state-of-the-art in other domains and close to the human baseline. Our experimentation with different feature subsets shows that the category relying on domain-specific sentiment lexical ('contextual' category), able to grasp the jargon and lingo used in Norwegian financial news, is of cardinal importance in classification - these features yield a precision increase of ~21% when added to the other feature categories. When comparing different machine learning classifiers, we find J48 classification trees to yield the highest performance, closely followed by Random Forests (RF), in line with recent studies, and in opposition to the antedated conception that Support Vector Machines (SVM) is superior in this domain.
机译:迄今为止,关于情感分析的工作仅限于新闻领域。这主要是由于以下原因造成的:1)新闻文章缺乏明确的目标,2)很难将好消息和坏消息与正面和负面情绪区分开,以及3)依赖于特定领域的解释的看似必要性和复杂性和背景知识。在本文中,我们提出,定义,试验和评估由26个文章特征组成的四个不同特征类别,以进行情感分析。我们使用五种不同的机器学习方法,对挪威金融互联网新闻文章的情感分类器进行了训练,并实现了约71%的分类精度。这可与其他领域的最新技术相媲美,并且接近人类基线。我们对不同特征子集的实验表明,依赖领域特定情感词汇的类别(“语境”类别)能够掌握挪威财经新闻中使用的术语和行话,在分类中具有至关重要的意义-这些特征产生了精确度的提高添加到其他功能类别时,约为21%。在比较不同的机器学习分类器时,我们发现J48分类树产生了最高的性能,紧随其后的是随机森林(RF),这与最近的研究一致,并且与支持向量机(SVM)在此域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号