Locating Complex Named Entities in Web Text

机译：在Web文本中查找复杂的命名实体

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was linaited to a small number of predefined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method's F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117% on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%.

机译：命名实体识别（NER）是在文本中查找和分类名称的任务。在以前的工作中，NER被限制为少数预定义的实体类（例如，人员，位置和组织）。但是，网络上的NER是一个更具挑战性的问题。复杂的名称（例如电影或书名）可能很难从文本中准确地挑选出来。此外，Web包含各种各样的实体类，这些实体类是事先未知的。因此，每个实体类别的手动标记示例是不切实际的。本文研究了Web NER第一步的新颖方法：在Web文本中定位复杂的命名实体。我们的主要观察结果是，命名实体可以看作是一个多字单元，可以通过在Web语料库上累积n-gram统计信息来检测。我们表明，当将统计方法应用于复杂名称时，该方法的F1得分比包括条件随机字段（CRF）和条件马尔可夫模型（CMM）在内的监督技术的F1得分高50％。在缺少训练数据的实体类别上，该方法的性能也比CMM和CRF高出117％。最后，我们的方法比半监督CRF的效果要好73％。

著录项

来源
《Twentieth International Joint Conference on Artificial Intelligence(IJCAI-07)》|2007年|P.27332739|共2页
会议地点
作者
Doug Downey; Matthew Broadhead; Oren Etzioni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Linking Spatial Named Entities to the Web of Data for Geographical Analysis of Historical Texts [J] . Paris Pierre Henri, Abadie Nathalie, Brando Carmen Journal of map & geography libraries . 2017,第1期

机译：将空间命名实体链接到数据网以对历史文本进行地理分析
2. Complex MSH2 and MSH6 mutations in hypermutated microsatellite unstable advanced prostate cancer [J] . Colin C. Pritchard, Colm Morrissey, Akash Kumar, Nature Communications . 2014,第2016期

机译：超变微卫星不稳定晚期前列腺癌中复杂的 MSH2 和 MSH6 突变
3. Retrieval of Complex Named Entities on the Web: Proposals for Similarity Computation [J] . Armel Fotsoh, Christian Sallaberry, Annig Le Parc Lacayrelle International Journal of Information Technology and Computer Science . 2019,第11期

机译：在网络上检索复杂的命名实体：相似度计算的建议
4. Locating Complex Named Entities in Web Text [C] . Doug Downey, Matthew Broadhead, Oren Etzioni International Joint Conference on Artificial Intelligence . 2007

机译：在Web文本中定位复杂的命名实体
5. Using a named entity tagger and a syntactic parser to improve Web-based answer extraction [D] . Kamel, Yasser. 2004

机译：使用命名实体标记器和语法解析器来改进基于Web的答案提取
6. De-identifying Spanish medical texts - named entity recognition applied to radiology reports [O] . Irene Pérez-Díez, Raúl Pérez-Moraga, Adolfo López-Cerdán, 2021

机译：去识别西班牙医学文本 - 命名实体识别适用于放射学报告
7. Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web [O] . Sagara Sumathipala, Koichi Yamada, Muneyuki Unehara, 2015

机译：基于源自Genia语料库和Web上的医疗文本的概率特征，蛋白质命名实体识别

Locating Complex Named Entities in Web Text

摘要

著录项

相似文献

相关主题

期刊订阅