首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Identification of Boundaries in Parallel Noun Phrases: A Probabilistic Swapping Model
【24h】

Identification of Boundaries in Parallel Noun Phrases: A Probabilistic Swapping Model

机译:平行名词短语的边界识别:概率交换模型

获取原文
获取原文并翻译 | 示例
           

摘要

Parallel structure is a way to factor out common constituents in the expressions, which makes an effect of simplification of expressions. The complexity can be greatly reduced at the phase of sentence parsing by identifying such boundaries of parallel structure In this paper, we propose a probabilistic model to identify parallel cores (corresponding constituents) as well as boundaries of parallel noun phrases conjoined by "wa/gwa" (conjunctive particle in Korean). It is based on the idea of swapping constituents, utilizing symmetry (two or more identical constituents are repeated) and reversibility (the order of constituents is changeable) in parallel structure. The probabilities are calculated from (unlabelled) corpus with parallel structures, which is an advantage over the approaches trained with labeled corpus. Our model, moreover, is not dependent on languages. It is also shown that the semantic features of the modifiers around parallel noun phrase and the patterns among words can be utilized further to correct the boundaries identified by the swapping model. Experiment shows that our probabilistic swapping model performs much better than symmetry-based model and machine learning based approaches.
机译:并行结构是一种排除表达式中常见组成部分的方法,可以简化表达式。通过识别这种平行结构的边界,可以在句子解析阶段极大地降低复杂度。本文提出了一种概率模型,用于识别平行核(对应的成分)以及以“ wa / gwa”联合的平行名词短语的边界”(韩语中的结语词)。它基于交换成分的想法,在并行结构中利用对称性(重复两个或多个相同的成分)和可逆性(成分的顺序是可变的)。从具有并行结构的(未标记)语料库计算概率,这比使用标记语料库训练的方法更具优势。此外,我们的模型不依赖于语言。还表明,平行名词短语周围修饰词的语义特征和单词之间的模式可以进一步利用以校正交换模型所标识的边界。实验表明,与基于对称的模型和基于机器学习的方法相比,我们的概率交换模型的性能要好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号