Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

?eker G?khan Ak?n; Eryi?it Gül?en

首页> 外文期刊>Semantic web >Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

【24h】

Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

机译：扩展基于CRF的命名实体识别模型，用于土耳其良好的文本和用户生成的content1

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However, the modelling of morphologically rich languages (MRLs) for the NER task is still an open research area. The studies for Turkish which is a strong representative of MRLs have fallen behind the well-studied languages for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to semantically enrich the textual data coming with user generated content initiated many studies in this field. This article presents a CRF-based NER system which successfully models the morphologically very rich nature of this language, its extensions to expand the covered named entity types, and also to process extra challenging user generated content coming with Web 2.0. The article introduces the re-annotation of the available datasets and a brand new dataset from Web 2.0. The introduced approach reveals an exact match F1 score of 92% on a dataset collected from Turkish news articles and ?65% on different datasets collected from Web 2.0. The proposed model is believed to be easily applied to similar MRLs with relevant resources.

机译：命名实体识别（ner）为许多高级NLP应用程序和语义Web技术提供有用的信息，是大多数语言的良好学习主题，特别是英语。但是，为单位任务的形态学丰富的语言（MRLS）的建模仍然是一个开放的研究区域。 Turkish的研究是MRLS强大代表的落后于学习良好的语言。近年来，土耳其菜内的研究人员由于其稀缺的数据资源和高性能系统的不可用。特别是，在语义上丰富具有用户生成内容的文本数据的需要发起了许多研究。本文介绍了基于CRF的NER系统，该系统成功地模拟了这种语言的形态非常丰富的性质，它的扩展要扩展所涵盖的命名实体类型，以及处理额外的挑战用户生成的内容，并使用Web 2.0进入。本文介绍了从Web 2.0的可用数据集和全新数据集的重新注算。引入的方法揭示了从土耳其新闻文章收集的数据集上的完全匹配F1分数为92％，并在Web 2.0收集的不同数据集中进行65％。据信拟议的模型很容易应用于具有相关资源的类似MRL。

著录项

来源
《Semantic web》 |2017年第5期|共18页
作者
?eker G?khan Ak?n; Eryi?it Gül?en;
展开▼
作者单位

ITU Informatics Institute Istanbul Technical University Istanbul 34469 Turkey.;

Department of Computer Engineering Istanbul Technical University Istanbul 34469 Turkey.;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Named entity recognition; Turkish; user generated content; CRF; web data;

机译：命名实体识别;土耳其;用户生成的内容;CRF;Web数据;

相似文献

外文文献
中文文献
专利

1. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1 [J] . ?eker G?khan Ak?n, Eryi?it Gül?en Semantic web . 2017,第5期

机译：扩展基于CRF的命名实体识别模型，用于土耳其良好的文本和用户生成的content1
2. Transfer learning for Turkish named entity recognition on noisy text [J] . Emre Kagan Akkaya, Burcu Can Natural language engineering . 2021,第Pta1期

机译：在嘈杂的文本上转移土耳其名为实体认可的学习
3. A neural model for text localization, transcription and named entity recognition in full pages [J] . Manuel Carbonell, Alicia Fornes, Mauricio Villegas, Pattern recognition letters . 2020,第Auga期

机译：完整页面文本定位，转录和命名实体识别的神经模型
4. Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition [C] . Timothy Baldwin, Marie Catherine de Marneffe, Bo Han, Workshop on noisy user-generated text . 2015

机译：2015年“嘈杂的用户生成的文本”研讨会的共享任务：Twitter词汇规范化和命名实体识别
5. The relational model versus the extended entity relationship model: A comparison of representations developed by autonomous users. [D] . Batra, Dinesh. 1989

机译：关系模型与扩展实体关系模型：自治用户开发的表示形式的比较。
6. Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models [O] . Jay Urbain -1

机译：利用命名实体识别和分布语义模型挖掘临床文本中的心脏病危险因素
7. Named entity recognition in medical texts in Russian using deep learning models [O] . 2020

机译：使用深入学习模型的俄罗斯医学文本中的名为实体识别

Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content1

摘要

著录项

相似文献

相关主题

期刊订阅