首页> 外文期刊>ACM transactions on Asian language information processing >Comparison of Methods to Annotate Named Entity Corpora
【24h】

Comparison of Methods to Annotate Named Entity Corpora

机译:命名实体语料库注释方法的比较

获取原文
获取原文并翻译 | 示例
           

摘要

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, degree of agreement, and performance were evaluated based on the gold standard. Because there were two annotators for one text for each method, two performances were evaluated: the average performance of both annotators and the performance when at least one annotator is correct. The experiments reveal that semi-automatic annotation is faster, achieves better agreement, and performs better on average. However, they also indicate that sometimes, fully manual annotation should be used for some texts whose document types are substantially different from the training data document types. In addition, the machine learning experiments using semi-automatic and fully manually annotated corpora as training data indicate that the F-measures could be better for some texts when manual instead of semiautomatic annotation was used. Finally, experiments using the annotated corpora for training as additional corpora show that (i) the NE recognition performance does not always correspond to the performance of the NE tag annotation and (ii) the system trained with the manually annotated corpus outperforms the system trained with the semi-automatically annotated corpus with respect to newswires, even though the existing NE recognizer was mainly trained with newswires.
机译:作者比较了两种使用非专家注释器为命名实体(NE)识别任务注释主体的方法:(i)修改现有NE识别器的结果,以及(ii)完全手动注释NE。注释时间,一致性程度和性能是根据金标准评估的。因为每种方法的一个文本有两个注释器,所以评估了两个性能:两个注释器的平均性能和至少一个正确的注释器的性能。实验表明,半自动注释的速度更快,一致性更高,并且平均效果更好。但是,它们还指出,有时对于文档类型与培训数据文档类型明显不同的某些文本,应使用完全手动注释。另外,使用半自动和完全手动注释的语料库作为训练数据的机器学习实验表明,当使用手动而不是半自动注释时,F度量对于某些文本可能更好。最后,使用带注释的语料库作为附加语料库进行训练的实验表明:(i)NE识别性能并不总是与NE标签注释的性能相对应;(ii)带有人工注释的语料库训练的系统的性能优于即使现有的NE识别器主要接受新闻通讯社培训,还是针对新闻通讯社的半自动注释语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号