首页> 外文学位 >A discourse-oriented approach to automatic Chinese zero anaphora resolution.
【24h】

A discourse-oriented approach to automatic Chinese zero anaphora resolution.

机译:一种基于话语的自动零回指中文解析方法。

获取原文
获取原文并翻译 | 示例

摘要

Anaphora resolution is a central task in natural language understanding systems. For Chinese, a major challenge is zero anaphora (ZA). The existing automatic Chinese ZA resolution algorithms, despite their differences, mainly rely on syntactic factors in resolving ZA. In contrast, linguistic studies show that discourse information such as topic is important in Chinese ZA resolution.;This study is intended to address this discrepancy by incorporating topic features into the proposed automatic ZA resolution algorithm. The corpus for this study is Converse (2006a). A machine-learning approach is adopted.;First, topic structures were annotated in the training data by two native speaker Chinese graduate students trained in linguistics. Each file in the training data was independently annotated and then adjudicated.;Then, four rounds of machine-learning algorithm were implemented: (1) Round 1, the baseline, used Zhao & Ng's (2007) 26 features (mainly syntactic information); (2) Round 2 added the manually annotated topics as one feature to the baseline; (3) Round 3 used 21 topic-related features automatically extracted from the corpus; (4) Error analysis was then conducted, on the basis of which 25 topic-related features were re-selected in Round 4.;The performances of the four rounds were compared in three ways: (i) by running 5-fold cross validation on the training data; (ii) by running the trained model on the test data; and (iii) by conducting ROC analysis. Results show that the use of manually annotated topics (Round 2) and carefully-selected topic-related features (Round 4) does help improve ZA resolution (e.g., on test data, the F-measure of Round 4 was 0.582, 29% higher than the baseline (0.452). McNemar's test shows that the error rate of Round 4 was significantly lower than the baseline (p0.01), the odds ratio being 3.0). In addition, Round 4 achieved similar results to Round 2: Because the features in Round 4 can be automatically extracted, it is cheaper and therefore more practical for ZA resolution than hand annotation.;This study therefore demonstrates that automatic Chinese ZA resolution can be improved beyond previous approaches by including in the model those syntactic features highly correlated with the discourse concept of topic.
机译:回指解析是自然语言理解系统的核心任务。对于中国人来说,主要的挑战是零回指(ZA)。现有的中文ZA解析自动算法尽管存在差异,但主要依靠句法因素来解析ZA。相比之下,语言学研究表明诸如话题之类的话语信息对于中文ZA解析很重要。本研究旨在通过将主题特征纳入拟议的自动ZA解析算法来解决这一差异。这项研究的主体是匡威(2006a)。首先,在两名训练有语言学的母语为中文的中国研究生中,训练数据中标注了主题结构。然后,实施了四轮机器学习算法:(1)第一轮,基线,使用Zhao&Ng(2007)的26个功能(主要是句法信息); (2)第2轮将手动注释的主题作为一项功能添加到基准中; (3)第三轮使用了从语料库中自动提取的21个与主题相关的特征; (4)然后进行了错误分析,在第4轮中重新选择了25个与主题相关的功能;基于以下三种方式比较了四轮的表现:(i)运行5倍交叉验证在训练数据上; (ii)对测试数据运行经过训练的模型; (iii)进行中华民国分析。结果表明,使用手动注释的主题(第2轮)和精心选择的主题相关功能(第4轮)确实有助于提高ZA分辨率(例如,在测试数据上,第4轮的F值为0.582,高29%) McNemar的检验表明,第4轮的错误率明显低于基线(p <0.01),优势比为3.0)。此外,第4轮与第2轮取得了相似的结果,因为第4轮中的特征可以自动提取,因此与手动标注相比,ZA分辨率更便宜并且更实用。通过在模型中包括那些与主题的话语概念高度相关的句法特征,从而超越了以前的方法。

著录项

  • 作者

    Tu, Xianghua.;

  • 作者单位

    Boston University.;

  • 授予单位 Boston University.;
  • 学科 Language Linguistics.;Language Modern.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 329 p.
  • 总页数 329
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号