...
首页> 外文期刊>Semantic web >Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1
【24h】

Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

机译:扩展基于CRF的命名实体识别模型,用于土耳其良好的文本和用户生成的content1

获取原文
获取原文并翻译 | 示例
           

摘要

Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However, the modelling of morphologically rich languages (MRLs) for the NER task is still an open research area. The studies for Turkish which is a strong representative of MRLs have fallen behind the well-studied languages for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to semantically enrich the textual data coming with user generated content initiated many studies in this field. This article presents a CRF-based NER system which successfully models the morphologically very rich nature of this language, its extensions to expand the covered named entity types, and also to process extra challenging user generated content coming with Web 2.0. The article introduces the re-annotation of the available datasets and a brand new dataset from Web 2.0. The introduced approach reveals an exact match F1 score of 92% on a dataset collected from Turkish news articles and ?65% on different datasets collected from Web 2.0. The proposed model is believed to be easily applied to similar MRLs with relevant resources.
机译:命名实体识别(ner)为许多高级NLP应用程序和语义Web技术提供有用的信息,是大多数语言的良好学习主题,特别是英语。但是,为单位任务的形态学丰富的语言(MRLS)的建模仍然是一个开放的研究区域。 Turkish的研究是MRLS强大代表的落后于学习良好的语言。近年来,土耳其菜内的研究人员由于其稀缺的数据资源和高性能系统的不可用。特别是,在语义上丰富具有用户生成内容的文本数据的需要发起了许多研究。本文介绍了基于CRF的NER系统,该系统成功地模拟了这种语言的形态非常丰富的性质,它的扩展要扩展所涵盖的命名实体类型,以及处理额外的挑战用户生成的内容,并使用Web 2.0进入。本文介绍了从Web 2.0的可用数据集和全新数据集的重新注算。引入的方法揭示了从土耳其新闻文章收集的数据集上的完全匹配F1分数为92%,并在Web 2.0收集的不同数据集中进行65%。据信拟议的模型很容易应用于具有相关资源的类似MRL。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号