首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Variance of average surprisal: a better predictor for quality of grammar from unsupervised PCFG induction
【24h】

Variance of average surprisal: a better predictor for quality of grammar from unsupervised PCFG induction

机译:平均惊喜的差异:从无维生的PCFG诱导的语法质量更好的预测因子

获取原文

摘要

In unsupervised grammar induction, data likelihood is known to be only weakly correlated with parsing accuracy, especially at convergence after multiple runs. In order to find a better indicator for quality of induced grammars, this paper correlates several linguistically- and psycholinguistically-motivated predictors to parsing accuracy on a large multilingual grammar induction evaluation data set. Results show that variance of average surprisal (VAS) better correlates with parsing accuracy than data likelihood, and that using VAS instead of data likelihood for model selection provides a significant accuracy boost. Further evidence shows VAS to be a better candidate than data likelihood for predicting word order typology classification. Analyses show that VAS seems to separate content words from function words in natural language grammars, and to better arrange words with different frequencies into separate classes that are more consistent with linguistic theory.
机译:在无监督的语法诱导中,已知数据可能性仅与解析精度略微相关,尤其是在多次运行后的收敛处。为了找到更好的诱导语法的指标,本文将几种语言和精神语言激励的预测因子与大型多语言语法诱导评估数据集的解析准确性相关联。结果表明,平均惊喜(VAS)的方差与解析精度比数据似然性更好地相关,并且使用VAS而不是模型选择的数据可能性提供了显着的精度提升。进一步的证据显示VAS比预测Word Order Typology Classification的数据可能性更好的候选者。分析表明,VAS似乎将内容词与自然语言语法中的功能词分开,并更好地将具有不同频率的单词布置成与语言理论更符合的单独类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号