首页> 外文会议>2nd Workshop on natural language processing for social media >Automatic Identification of Arabic Language Varieties and Dialects in Social Media
【24h】

Automatic Identification of Arabic Language Varieties and Dialects in Social Media

机译:在社交媒体中自动识别阿拉伯语言的品种和方言

获取原文
获取原文并翻译 | 示例

摘要

Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different conditions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable overall accuracy of 98%.
机译:现代标准阿拉伯语(MSA)是大多数阿拉伯国家/地区的正式语言。阿拉伯方言(AD)或日常语言与MSA有所不同,特别是在社交媒体交流中。但是,大多数阿拉伯语社交媒体文本的格式混合且变化很大,尤其是在MSA和AD之间。本文旨在通过使用跨社交媒体数据集的概率模型提供AD分类的框架,以弥合MSA与AD之间的鸿沟。我们提出了一组使用字符n-gram马尔可夫语言模型和朴素贝叶斯分类器的实验,并详细研究了哪些模型在社交媒体环境中的不同条件下效果最佳。实验结果表明,基于字符二元语法模型的朴素贝叶斯分类器可以识别18种不同的阿拉伯方言,总体准确性高达98%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号