首页> 外文会议>Annual meeting of the Association for Computational Linguistics >What Kind of Language Is Hard to Language-Model?
【24h】

What Kind of Language Is Hard to Language-Model?

机译:语言模型很难什么样的语言?

获取原文

摘要

How language-agnostic are current state-of-the-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the high-resource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that 'translationese' is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.
机译:语言无话量是如何是最新的最先进的NLP工具?是否有一些类型的语言与当前方法更容易模型?在上班(Cotterell等,2018),我们试图解决这个问题的语言建模,并观察到经常性神经网络语言模型不会在欧运群组织中发现的所有高资源欧洲语言表现得同样好。我们推测推测,拐点形态可能是差异的主要罪魁祸首。在本文中,我们将这些早期的实验延长了使用多语言圣经语料库从13个语言系列覆盖69种语言。方法论上讲,我们介绍了一种新的配对样本乘法混合效应模型,以获得来自至少一对平行的语言难度系数。换句话说,该模型意识到句子间变化并且可以处理缺失的数据。利用此模型,我们表明“翻译黄色”在公平比较中的模型比当然的书面语言更容易。试图回答困难语言有共同点的问题,我们尝试和未能重现我们的早期(Cotterell等,2018)关于形态复杂性的观察,而是揭示了似乎驱动了复杂性的数据的更简单统计数据更大的样本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号