首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods
【24h】

Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods

机译:具有分布语义竞争的无监督复合拆分监督方法

获取原文

摘要

In this paper we present a word decompounding method that is based on distributional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like "candle" and "stick") are seman-tically similar to the entire compound, which helps to exclude spurious splits (like "candles" and "tick"). We report results for German and Dutch: For German, our unsupervised method comes on par with the performance of a rule-based and a supervised method and significantly outperforms two unsupervised baselines. For Dutch, our method performs only slightly below a rule-based optimized compound splitter.
机译:在本文中,我们提出了一种基于分布语义的词分解方法。我们的方法不需要任何语言知识,并使用大型单语语料库进行初始化。我们方法的核心思想是化合物的某些部分(例如“蜡烛”和“棒”)在语义上与整个化合物相似,这有助于排除虚假拆分(例如“蜡烛”和“滴答”)。我们报告了德语和荷兰语的结果:对于德语,我们的无监督方法与基于规则和有监督方法的性能相当,并且显着优于两个无监督基准。对于荷兰语,我们的方法仅比基于规则的优化复合拆分器执行效果稍差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号