首页> 外文会议>International conference on computational linguistics >Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew
【24h】

Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew

机译:形态丰富的语言的神经情感分析中的表示形式和体系结构:以现代希伯来语为例

获取原文

摘要

This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (ⅰ) the granularity of the input signal (token-based vs. morpheme-based), and (ⅱ) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs: fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances thereof: token-based and morpheme-based. Our experiments show that the effect of representational choices vary with architectural types. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavor also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89% accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks' task performance.
机译:本文对现代希伯来语(一种形态丰富的语言(MRL),目前尚不存在情感分析器)的神经情感分析进行实证研究,研究了表征选择的影响。我们研究了代表选择的两个维度:(ⅰ)输入信号的粒度(基于令牌与基于词素),以及(ⅱ)词汇项目的编码水平(基于字符串与基于字符)。我们假设,对于MRL(一种语言,其中多个含含义的元素可能由单个以空格分隔的标记携带),这些选择将对任务性能产生可测量的影响,并且这些影响可能因不同的体系结构设计而异:全连接,卷积或复发。具体来说,我们假设基于语素的表示由于具有更好的OOV覆盖率,因此在泛化能力和任务准确性方面将具有优势。为了从经验上研究这些影响,我们基于12K社交媒体评论开发了一个希伯来语新的情感分析基准,并提供了两个实例:基于令牌和基于词素。我们的实验表明,代表性选择的效果随建筑类型的不同而不同。尽管全连接和卷积网络稍微偏爱基于令牌的设置,但RNN受益于基于词素的表示,这与显式形态信息可能有助于推广的假设相一致。我们的努力还为希伯来语提供了第一台最先进的,广泛覆盖的情感分析器,其准确率超过89%,同时还建立了基准,以进一步研究语言表示选择对神经网络任务性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号