...
首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar
【24h】

Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar

机译:利用阿拉伯语语法的基因原则提取阿拉伯复合名称

获取原文
获取原文并翻译 | 示例
           

摘要

Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns ((sic)) and indefinite nouns ((sic)) to support the process of extracting composite names. Based on domain knowledge andArabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definitenesswithin phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.
机译:命名实体识别(ner)是使用自然语言处理(NLP)进行信息检索的基本先决条件。阿拉伯语尤其具有挑战性,因为语言在形态上富裕并且没有少量元音,没有资本化公约。本文介绍了一种基于规则的基于规则的方法,它使用基于语言语法的技术从阿拉伯文中提取阿拉伯文复合名称。我们的方法独特地利用了阿拉伯语语法规则的基因;特别是关于确定确定名词((SIC))和无限名词((SIC))的规则,以支持提取复合名称的过程。基于域名知识Andarabic Genivitive规则(AGR),发达的方法正规正规,即最初使用基于一个语法规则和语言模式,最初使用基本的语法规则和语言模式,最初使用基本的语言规则和语言模式来分类明确的Within短语,然后从非结构化文本中提取适当的复合名称。开发的新建方法不会对阿拉伯复合名称的长度进行任何限制,并且我们的初始实验表明,当Ner算法应用于金融域语料库时,初始实验表明了高召回和精确的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号