首页> 外文期刊>Language Resources and Evaluation >Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition
【24h】

Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition

机译:研究独立于语言和特定于语言的功能对混合阿拉伯语人名识别的影响

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, extensive experiments are conducted to study the impact of features of different categories, in isolation and gradually in an incremental manner, on Arabic Person name recognition. We present an integrated system that employs the rule-based approach with the machine learning (ML)-based approach in order to develop a consolidated hybrid system. Our feature space is comprised of language-independent and language-specific features. The explored features are naturally grouped under six categories: Person named entity tags predicted by the rule-based component, word-level features, POS features, morphological features, gazetteer features, and other contextual features. As decision tree algorithm has proved comparatively higher efficiency as a classifier in current state-of-the-art hybrid Named Entity Recognition for Arabic, it is adopted in this study as the ML technique utilized by the hybrid system. Therefore, the experiments are focused on two dimensions: the standard dataset used and the set of selected features. A number of standard datasets are used for the training and testing of the hybrid system, including ACE (2003-2004) and ANERcorp. The experimental analysis indicates that both language-independent and language-specific features play an important role in overcoming the challenges posed by Arabic language and have demonstrated critical impact on optimizing the performance of the hybrid system.
机译:在本文中,进行了广泛的实验,以孤立的方式逐步地研究了不同类别的特征对阿拉伯语人名识别的影响。我们提出了一种集成系统,该系统采用基于规则的方法和基于机器学习(ML)的方法,以开发整合的混合系统。我们的特征空间由与语言无关和特定于语言的特征组成。探索的功能自然分为以下六类:由基于规则的组件预测的人员命名实体标签,单词级功能,POS功能,形态功能,地名词典功能和其他上下文功能。由于决策树算法已被证明在当前最先进的阿拉伯混合命名实体识别中作为分类器具有较高的效率,因此在本研究中将其用作混合系统使用的ML技术。因此,实验着重于两个方面:使用的标准数据集和所选要素的集合。许多标准数据集都用于混合系统的训练和测试,包括ACE(2003-2004)和ANERcorp。实验分析表明,独立于语言的特征和特定于语言的特征在克服阿拉伯语言所带来的挑战中都发挥着重要作用,并已显示出对优化混合系统性能的关键影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号