首页> 外文会议>9th International conference on language resources and evaluation >Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon
【24h】

Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

机译:Tharwa:大规模方言阿拉伯语 - 标准阿拉伯语 - 英语词典

获取原文

摘要

We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73,000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.
机译:我们介绍了一种电子三方莱克西森,Tharwa,包括辩证辩护,现代标准的阿拉伯语和英语通讯员。该文件侧重于埃及阿拉伯语作为资源的第一个试点方针,计划在项目后期阶段扩展到阿拉伯语的其他方言。我们描述了Tharwa的创建过程和报告其当前状态。词汇表项以各种语言信息的元素增强,例如POS,性别,合理性,数字和root和模式信息。 Lexicon是基于来自单语和双语现有资源的信息,例如纸质词典和基于电子语料库的词典。对创建过程中的每个步骤的输出执行多个级别的质量检查。本词典的重要性在于,它是拓宽与英语的多个阿拉伯语多种变种的第一个资源。此外,它是一个广泛的覆盖词汇资源,包含超过73,000埃及的条目。 Tharwa公开提供。我们认为它将对理论语言学以及计算语言学研究产生重大影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号