Automatic Identification of Arabic Language Varieties and Dialects in Social Media

机译：在社交媒体中自动识别阿拉伯语言的品种和方言

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different conditions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable overall accuracy of 98%.

机译：现代标准阿拉伯语（MSA）是大多数阿拉伯国家/地区的正式语言。阿拉伯方言（AD）或日常语言与MSA有所不同，特别是在社交媒体交流中。但是，大多数阿拉伯语社交媒体文本的格式混合且变化很大，尤其是在MSA和AD之间。本文旨在通过使用跨社交媒体数据集的概率模型提供AD分类的框架，以弥合MSA与AD之间的鸿沟。我们提出了一组使用字符n-gram马尔可夫语言模型和朴素贝叶斯分类器的实验，并详细研究了哪些模型在社交媒体环境中的不同条件下效果最佳。实验结果表明，基于字符二元语法模型的朴素贝叶斯分类器可以识别18种不同的阿拉伯方言，总体准确性高达98％。

著录项

来源
《2nd Workshop on natural language processing for social media》|2014年|22-27|共6页
会议地点 Dublin(IE)
作者
Fatiha Sadat; Farnazeh Kazemi; Atefeh Farzindar;
展开▼
作者单位

University of Quebec in Montreal, 201 President Kennedy, Montreal, QC, Canada;

NLP Technologies Inc. 52 Le Royer Street W., Montreal, QC, Canada;

NLP Technologies Inc. 52 Le Royer Street W., Montreal, QC, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects [J] . Mohamed Lichouri, Mourad Abbas, Abed Alhakim Freihat, Procedia Computer Science . 2018,第22期

机译：单词级与句子级语言识别：应用于阿尔及利亚和阿拉伯方言
2. Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms [J] . Nagaratna B. Chittaragi, Shashidhar G. Koolagudi Language Resources and Evaluation . 2020,第2期

机译：使用单一和集合SVM算法的Kannada语言自动方言识别系统
3. Automatic Detection of Cyberbullying and Abusive Language in Arabic Content on Social Networks: A Survey [J] . Marwa Khairy, Tarek M. Mahmoud, Tarek Abd-El-Hafeez Procedia Computer Science . 2021,第a期

机译：在社交网络中的阿拉伯语内容中自动检测网络欺凌和滥用语言：调查
4. Automatic Identification of Arabic Language Varieties and Dialects in Social Media [C] . Fatiha Sadat, Farnazeh Kazemi, Atefeh Farzindar Workshop on natural language processing for social media . 2014

机译：自动识别社交媒体中的阿拉伯语品种和方言
5. Arabic Dialect Identification [D] . Al-Mannai, Kamela Ali 2018

机译：阿拉伯方言识别
6. We tweet Arabic; I tweet English: self-concept language and social media [O] . Justin Thomas, Aamna Al-Shehhi, Marwa Al-Ameri, 2019

机译：我们发布阿拉伯文推文；我发英文：自我概念语言和社交媒体
7. Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts [O] . Alshutayri A, Atwell ES, Alosaimy A, 2016

机译：阿拉伯语基于WEKa的阿拉伯语自动语音识别成语的方言分类器

Automatic Identification of Arabic Language Varieties and Dialects in Social Media

摘要

著录项

相似文献

相关主题

期刊订阅