Multi-level embeddings for processing Arabic social media contents

Leila Moudjari; Farah Benamara; Karima Akli-Astouati

首页> 外文期刊>Computer speech and language >Multi-level embeddings for processing Arabic social media contents

【24h】

Multi-level embeddings for processing Arabic social media contents

机译：用于处理阿拉伯社交媒体内容的多级嵌入物

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Embeddings are very popular representations that allow computing semantic and syntactic similarities between linguistic units from text co-occurrence matrix. Units can vary from character n-grams to words, including more coarse-grained units such as sentences and documents. Recently, multi-level embeddings combining representations from different units have been proposed as an alternative to single-level embeddings to account for the internal structure of words (i.e., morphology) and help systems to generalise well over out of vocabulary words. These representations, either pre-trained or learned, have shown to be quite effective, outperforming word-level baselines in several NLP tasks such as machine translation, part of speech tagging and named entity recognition. Our aim here is to contribute to this line of research proposing for the first time in Arabic NLP an in-depth study of the impact of various subwords configurations ranging from character to character n-grams (including word) for social media text classification. We propose several neural architectures to learn character, subword and word embeddings, as well as a combination of these three levels, exploring different composition functions to obtain the final representation of a given text. To evaluate the effectiveness of these representations, we perform extrinsic evaluations on three text classification tasks (sentiment analysis, emotion detection and irony detection) while accounting for different Arabic varieties (Modern Standard Arabic, dialects (Levantine and Maghrebi)). For each task, we experiment with well-known dialect-agnostic and dialect-specific datasets, including those that have been recently used in shared tasks to better compare our results with those reported in previous studies on the same datasets. The results show that the multi-level embeddings we propose outperform current static and contextual-ised embeddings as well as best performing state of the art models in sentiment and emotion detection. In addition, we achieve competitive results in irony detection. Our models are also the most productive across dialects observing that different dialects require different composition configurations. We finally show that these performances tend to increase when coupling the multi-level representations with task-specific features.

机译：Embeddings是非常流行的表示，允许从文本共同发生矩阵之间计算语言单位之间的语义和句法相似性。单位可以因字符n-gram而异，包括更多粗粒的单位，如句子和文档。最近，已经提出了多级嵌入与不同单位的表示，作为单级嵌入的替代方案，以考虑单词（即形态学）的内部结构和帮助系统，以概括出在词汇中的良好状态。这些表示是预先培训或学习的，已经显示出在几个NLP任务中的单词级基线，如机器翻译，包括语音标记和命名实体识别的一部分。我们的目标是促进这一研究系列，这是阿拉伯语NLP第一次提出的关于社交媒体文本分类的各种子字配置的深入研究，从字符到角色n-gram（包括字）。我们提出了几个神经架构来学习字符，子字和Word Embeddings，以及这三个级别的组合，探索不同的构图函数来获得给定文本的最终表示。为了评估这些陈述的有效性，我们对三个文本分类任务（情感检测和讽刺检测）进行外在评估，同时对不同的阿拉伯品种（现代标准阿拉伯语，方言（Levantine和Maghrebi））进行核算。对于每项任务，我们尝试使用众所周知的方言 - 不可知论者和方言特定的数据集，包括最近用于共享任务的那些，以便更好地将我们的结果与在同一数据集上的先前研究中报告的结果进行比较。结果表明，多级嵌入式我们提出了优于当前的静态和上下文的嵌入品，以及在情感和情感检测中最佳表现最佳状态。此外，我们在讽刺检测中实现了竞争结果。我们的模型也是跨方言最富有成效的方言，观察不同的方言需要不同的组成配置。我们终于表明，当耦合特定任务特征的多级表示时，这些表演往往会增加。

著录项

来源
《Computer speech and language》 |2021年第11期|101240.1-101240.27|共27页
作者
Leila Moudjari; Farah Benamara; Karima Akli-Astouati;
展开▼
作者单位

RIIMA Laboratory University of Sciences and Technology Houari Boumediene (USTHB) Algiers Algeria;

IRIT-CNRS Universite de Toulouse France;

RIIMA Laboratory University of Sciences and Technology Houari Boumediene (USTHB) Algiers Algeria;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-level embeddings; Convolutional neural networks; Arabic dialects; Sentiment analysis; Emotion detection; Irony detection;

机译：多级嵌入式;卷积神经网络;阿拉伯语方言;情绪分析;情绪检测;讽刺检测;

相似文献

外文文献
中文文献
专利

1. A 1 mJ/Frame Unified Media Application Processor With Dynamic Analog-Digital Mode Reconfiguration for Embedded 3D-Media Contents Processing [J] . Kim, H.-E., Park, IEEE Journal of Solid-State Circuits . 2013,第8期

机译：具有动态模拟数字模式重新配置功能的1 mJ /帧统一媒体应用处理器，用于嵌入式3D媒体内容处理
2. Detecting sentiment embedded in Arabic social media - A lexicon-based approach [J] . Duwairi R. M., Ahmed Nizar A., Al-Rifai Saleh Y. Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2015,第1期

机译：检测嵌入在阿拉伯社交媒体中的情感-基于词典的方法
3. Clinical Trials in Social Media: Content Analysis of Available YouTube Videos in Arabic [J] . Al-Tabba Amal, Al-Omari Amal, Al-Hussaini Maysa Journal of empirical research on human research ethics : . 2020,第3期

机译：社交媒体中的临床试验：阿拉伯语中可用YouTube视频的内容分析
4. Aspect-based Sentiment Analysis for Arabic Content in Social Media [C] . Norah Fahad Alshammari, Amal Abdullah AlMansour International Conference on Electrical, Communication, and Computer Engineering . 2020

机译：基于方面的社交媒体阿拉伯内容情感分析
5. Towards Machine Learning for Gulf Dialectical Arabic Malicious Content Detection in Social Media [D] . Alorini, Dema. 2018

机译：面向机器学习的社交媒体中海湾辩证阿拉伯语恶意内容检测
6. Preprocessing Arabic text on social media [O] . Mohamed Osman Hegazi, Yasser Al-Dossari, Abdullah Al-Yahy, 2021

机译：预处理在社交媒体上的阿拉伯文文本
7. Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs [O] . Abdullah I. Alharbi, Mark Lee 2020

机译：结合字符和单词嵌入对阿拉伯非正式社交媒体微博影响的影响

Multi-level embeddings for processing Arabic social media contents

摘要

著录项

相似文献

相关主题

期刊订阅