首页> 外文学位 >Apply syntactic features in a maximum entropy framework for English and Chinese reading comprehension.
【24h】

Apply syntactic features in a maximum entropy framework for English and Chinese reading comprehension.

机译:在英语和汉语阅读理解的最大熵框架中应用语法功能。

获取原文
获取原文并翻译 | 示例

摘要

Automatic reading comprehension (RC) systems integrate various kinds of natural language processing (NLP) technologies to analyze a given passage and generate or extract answers in response to questions about the passage. Previous work applied a lot of NLP technologies including shallow syntactic analyses (e.g. base noun phrases), semantic analyses (e.g. named entities) and discourse analyses (e.g. pronoun referents) in the bag-of-words (BOW) matching approach. This thesis proposes a novel RC approach that integrates a set of NLP technologies in a maximum entropy (ME) framework to estimate candidate answer sentences' probabilities being answers. In contrast to previous RC approaches, which are in English-only, the presented RC approach is the first one for both English and Chinese, the two languages used by most people in the world. In order to support the evaluation of the bilingual RC systems, a parallel English and Chinese corpus is also designed and developed. Annotations deemed relevant to the RC task are also included in the corpus. In addition, useful NLP technologies are explored from a new perspective---referring the pedagogical guidelines of humans, reading skills are summarized and mapped to various NLP technologies. Practical NLP technologies, categorized as shallow syntactic analyses (i.e. part-of-speech tags, voices and tenses) and deep syntactic analyses (i.e. syntactic parse trees and dependency parse trees) are then selected for integration. The proposed approach is evaluated on an English corpus, namely Remedia and our bilingual corpus. The experimental results show that our approach significantly improves the RC results on both English and Chinese corpora.
机译:自动阅读理解(RC)系统集成了各种自然语言处理(NLP)技术,以分析给定的段落并响应于有关段落的问题而生成或提取答案。先前的工作使用了很多NLP技术,包括词袋(BOW)匹配方法中的浅层语法分析(例如基本名词短语),语义分析(例如命名实体)和语篇分析(例如代词指代物)。本文提出了一种新颖的RC方法,该方法将一组NLP技术集成在最大熵(ME)框架中,以估计候选答案句子的概率为答案。与以前的仅使用英语的RC方法相反,提出的RC方法是针对英语和中文(世界上大多数人使用的两种语言)的第一种方法。为了支持对双语RC系统的评估,还设计并开发了平行的英汉语料库。语料库中还包括被认为与RC任务相关的注释。此外,还从新的角度探索了有用的NLP技术-引用人类的教学指南,总结了阅读技巧并将其映射到各种NLP技术。然后选择实用的NLP技术,将其归类为浅层语法分析(即词性标记,语音和时态)和深层语法分析(即句法分析树和依赖关系分析树)进行集成。在英语语料库,即Remedia和我们的双语语料库上对所提出的方法进行了评估。实验结果表明,我们的方法显着提高了英语和汉语语料库的RC结果。

著录项

  • 作者

    Xu, Kui.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:39:01

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号