首页> 外文会议>2018 International Conference on Intelligent Systems and Computer Vision >An hybrid approach to improve part of speech tagging system
【24h】

An hybrid approach to improve part of speech tagging system

机译:一种改进部分语音标签系统的混合方法

获取原文
获取原文并翻译 | 示例

摘要

Platforms interacting with data in text format, such as social networks or search engines, face major challenges regarding this flow of texts such as storage, search and information processing. New disciplines have emerged as natural language processing that involve identifying all aspects of language (spoken or written). In this perspective, we focus on the aspect of part-of speech (POS) tagging applied to the Arabic language which consists in marking each word in the text with its good tag. One of the most difficult problems affecting POS tagging is the ambiguity of the text. Ambiguity is the most important problem in the natural language processing. We propose a rule-based hybrid approach with an artificial neural network classifier to determine the appropriate tags of an Arabic text. The first phase consists of extracting all the affixes to identify the nature of the word and its tags according to grammatical rules, the second phase begins by transliterating the Arabic text into text with Roman letters. The transliterated text is then transformed into digital vectors to form the input of the classifier based on the neural networks. The two phases are combined to identify the tag of each word.
机译:与文本格式的数据进行交互的平台(例如社交网络或搜索引擎)在诸如存储,搜索和信息处理等文本流方面面临重大挑战。作为自然语言处理的新学科已经出现,涉及识别语言的所有方面(口语或书面)。从这个角度来看,我们专注于应用于阿拉伯语的词性(POS)标记方面,即用良好的标记标记文本中的每个单词。影响POS标签最困难的问题之一是文本的歧义。在自然语言处理中,歧义是最重要的问题。我们提出了一种基于规则的混合方法,并使用人工神经网络分类器来确定阿拉伯文本的适当标签。第一阶段包括提取所有词缀,以根据语法规则识别单词及其标签的性质,第二阶段从将阿拉伯文本音译为带有罗马字母的文本开始。然后将音译后的文本转换为数字矢量,以形成基于神经网络的分类器输入。将两个阶段组合在一起以标识每个单词的标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号