首页> 外文会议>China National Conference on Computational Linguistics >Reusable Phrase Extraction Based on Syntactic Parsing
【24h】

Reusable Phrase Extraction Based on Syntactic Parsing

机译:基于句法解析的可重复使用短语提取

获取原文

摘要

Academic Phrasebank is an important resource composed of neutral and generic phrases for academic writers. In this paper, we name these neutral and generic phrases reusable phrases, and student writers use them to organize their research articles. Due to the limited size of Academic Phrasebank, it can not meet all the academic writing needs. There are still a large number of reusable phrases in authentic research articles. In order to make up for the deficiency of Academic Phrasebank, we proposed a reusable phrase extraction model based on constituency parsing and dependency parsing to automatically extract reusable phrases from unlabelled research articles. We divided the proposed model into three main components including a reusable words corpus module, a sentence simplification module, and a syntactic parsing module. We created a reusable words corpus of 2129 words to help judge whether a word is neutral and generic, and created two datasets under two scenarios to verify the feasibility of the proposed model.
机译:学术短语是由学术作家中立和常规短语组成的重要资源。在本文中,我们命名这些中立和泛型短语可重复使用的短语,学生作家使用它们来组织他们的研究文章。由于学术短语大小有限,它无法满足所有学术写作需求。在真实的研究文章中仍有大量可重复使用的短语。为了弥补学术短语银行的缺陷,我们提出了一种基于选区解析和依赖解析的重复使用短语提取模型,以自动从未标识的研究文章中提取可重复使用的短语。我们将提议的模型划分为三个主要组件,包括可重用单词语料库模块,句子简化模块和语法解析模块。我们创建了2129个单词的可重用单词语料库,以帮助判断单词是否是中性和通用的,并且在两个方案下创建了两个数据集,以验证所提出的模型的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号