首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian
【2h】

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

机译:一种有效的基于伯特语的管道用于Twitter情绪分析 - 以意大利语为例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.
机译:在过去十年中,工业和学术界的焦点在于情绪分析技术,特别适用于推文。最近使用最先进的结果使用从专门由Tweets的Corpora划船训练的语言模型实现,以便更好地处理Twitter术语。这项工作旨在根据两个步骤引入不同的Twitter情感分析方法。首先,推文术语包括表情歌剧和表情符号,被转换为纯文本,利用语言无关或轻松适用于不同语言的程序。其次,使用语言模型BERT进行分类所产生的推文,但是在纯文本上进行预先培训,而不是推文,原因如下:(1)纯文本上的预先训练模型很容易有多种语言,避免资源 - 避免资源和耗时的模型直接训练从头划痕; (2)可用的纯文本语料库大于仅限推文,因此允许更好的性能。介绍了描述了对意大利语方法的应用的案例研究,并与其他意大利现有解决方案进行了比较。得到的结果显示了方法的有效性,并表明,由于其从方法的角度来看,它也可能对其他语言有望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号