...
首页> 外文期刊>Computational Intelligence >EXPLOITING SUBTREES IN AUTO-PARSED DATA TO IMPROVE DEPENDENCY PARSING
【24h】

EXPLOITING SUBTREES IN AUTO-PARSED DATA TO IMPROVE DEPENDENCY PARSING

机译:探索自动解析的数据中的子句以改善依赖关系解析

获取原文
获取原文并翻译 | 示例
           

摘要

Dependency parsing has attracted considerable interest from researchers and developers in natural language processing. However, to obtain a high-accuracy dependency parser, supervised techniques require a large volume of hand-annotated data, which are extremely expensive. This paper presents a simple and effective approach for improving dependency parsing with subtrees derived from unannotated data, which are easy to obtain. First, we use a baseline parser to parse large-scale unannotated data. Then, we extract subtrees from dependency parse trees in the auto-parsed data. Next, the extracted subtrees are classified into several sets according to their frequency. Finally, we design new features based on the subtree sets for parsing algorithms. To demonstrate the effectiveness of our proposed approach, we conduct experiments on the English Penn Treebank and Chinese Penn Treebank. The results show that our approach significantly outperforms baseline systems. It also achieves the best accuracy for the Chinese data and an accuracy competitive with the best known systems for the English data.
机译:依赖解析已引起研究人员和开发人员对自然语言处理的极大兴趣。但是,为了获得高精度的依赖解析器,监督技术需要大量的手工注释数据,这非常昂贵。本文提出了一种简单有效的方法,用于改进从无注释数据派生的子树的依赖性解析,该子树易于获得。首先,我们使用基线解析器来解析大规模未注释的数据。然后,我们从自动解析的数据中的依赖解析树中提取子树。接下来,根据提取的子树的频率将其分为几组。最后,我们基于子树集设计新功能来解析算法。为了证明我们提出的方法的有效性,我们在英语宾州树库和中国宾州树库上进行了实验。结果表明,我们的方法明显优于基准系统。它还可以使中文数据达到最佳精度,并且与英文数据的最佳系统相比也具有竞争优势。

著录项

  • 来源
    《Computational Intelligence》 |2012年第3期|p.426-451|共26页
  • 作者单位

    Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan,Human Language Technology, Institute for Infocomm Research, Singapore;

    Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan;

    Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan;

    Language Infrastructure Group, MASTAR Project, National Institute of Information and Communications Technology, Tokyo, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    natural language processing; dependency parsing; semi-supervised learning; subtree extraction;

    机译:自然语言处理;依赖解析;半监督学习;子树提取;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号