首页> 外文会议>Annual meeting of the Association for Computational Linguistics >What Kind of Language Is Hard to Language-Model?
【24h】

What Kind of Language Is Hard to Language-Model?

机译:哪种语言很难进行语言建模?

获取原文

摘要

How language-agnostic are current state-of-the-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the high-resource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that 'translationese' is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.
机译:当前最新的NLP工具如何与语言无关?是否有某些类型的语言更容易用当前方法建模?在先前的工作中(Cotterell等人,2018),我们尝试解决该语言建模问题,并观察到递归神经网络语言模型在Europarl语料库中发现的所有高资源欧洲语言上的表现均不一样。我们推测,屈折形态可能是造成这种差异的主要原因。在本文中,我们将这些较早的实验扩展为使用多语种圣经语料库涵盖13个语言家族的69种语言。从方法上讲,我们引入了一种新的配对样本乘积混合效应模型,以从至少成对的平行语料库中获得语言难度系数。换句话说,该模型知道句子间的变化并且可以处理丢失的数据。利用此模型,我们显示出“翻译”在建模方面比以母语编写的语言要容易得多。为了回答困难的语言具有哪些共同特征的问题,我们尝试并未能重现之前关于形态复杂性的观察(Cotterell et al。,2018),而是揭示了似乎更简单的统计数据,这些数据似乎导致了语言复杂性的提高。更大的样本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号