【24h】

An Efficient Tool for Building a Large Part-Of-Speech Annotated Corpus

机译:构建大型语音注释的语料库的有效工具

获取原文
获取原文并翻译 | 示例

摘要

Large part-of-speech(pos) annotated corpus play an important role in many kinds of natural language processing. So, the annotated corpus requires very high accuracy and consistency. To build such accurate and consistent corpus, we often use manual tagging. But the manual tagging is very labor intensive and expensive. Furthermore, it is not easy to get consistent results from the human experts. The goal of this work is to develope an efficient tool for building accurate and a consistent pos annotated corpus with minimal human labor. The developed tool can help minimize the amount of the human labor and make the results consistent by using lexical rules. The lexical rules are acquired from human experts in the similar way of manual tagging and manual error correction. They are used to annotate the same word in the same context in the whole corpus.
机译:带有大部分语音标注的语料库在多种自然语言处理中起着重要的作用。因此,带注释的语料库要求非常高的准确性和一致性。为了建立这种准确一致的语料库,我们经常使用手动标记。但是,手动标记非常费力且昂贵。此外,要从人类专家那里获得一致的结果并不容易。这项工作的目标是开发一种有效的工具,以最少的人力来构建准确且一致的带位置注释的语料库。开发的工具可以帮助您减少人工工作量,并通过使用词汇规则使结果一致。词汇规则是通过人工标记和人工纠错的类似方式从人类专家那里获得的。它们用于在整个语料库中的相同上下文中注释相同的单词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号