Learning structural dependencies of words in the Zipfian Tail

TEJASWINI DEOSKAR; MARKOS MYLONAKIS; KHALIL SIMAAN

首页> 外文期刊>Journal of logic and computation >Learning structural dependencies of words in the Zipfian Tail

【24h】

Learning structural dependencies of words in the Zipfian Tail

机译：学习Zipfian尾巴中单词的结构依赖性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article uses semi-supervised Expectation Maximization (EM) to learn lexico-syntactic dependencies, i.e. associations between words and the structures that occur with them. Due to Zipfian distributions in language, such dependencies are extremely sparse in labelled data, and unlabelled data are the only source for learning them. Specifically, we learn sparse lexical parameters of a generative parsing model (a Probabilistic Context-Free Grammar, PCFG) that is initially estimated over the Penn Treebank. Our lexical parameters are similar to supertags-they are fine-grained, and encode complex structural information at the pre-terminal level. Our goal is to use unlabelled data to learn these for words that are rare or unseen in the labelled data. We get large error reductions (up to 17.5%) in parsing ambiguous structures associated with unseen verbs, the most important case of learning lexico-structural dependencies, resulting in a statistically significant improvement in labelled bracketing score of the treebank PCFG Our semi-supervised method incorporates structural and lexical priors from the labelled data to guide estimation from unlabelled data, and is the first successful use of semi-supervised EM to improve a generative structured model already trained over large labelled data. The method scales well to larger amounts of unlabelled data, and also gives substantial error reductions (up to 11.5%) for models trained on smaller amounts of labelled data, making it relevant to low-resource languages with small treebanks as well.

机译：本文使用半监督的期望最大化（EM）来学习词汇句法依存关系，即单词与伴随它们出现的结构之间的关联。由于语言的Zipfian分布，这种依赖性在标记数据中极为稀疏，而未标记数据是学习它们的唯一来源。具体来说，我们学习生成解析模型（概率上下文无关语法，PCFG）的稀疏词法参数，该模型最初是在Penn树库中估算的。我们的词法参数类似于超级标记-它们的粒度很细，并在终端前级别编码复杂的结构信息。我们的目标是使用未标记的数据来学习标记数据中稀有或看不见的单词。在解析与看不见的动词相关的歧义结构时，我们获得了大幅度的错误减少（高达17.5％），这是学习词汇-结构相关性的最重要情况，从而在树状结构PCFG的带标签的包围式评分中具有统计上的显着改善，我们的半监督方法结合了来自标记数据的结构和词法先验，以指导对未标记数据的估计，并且是半监督EM首次成功使用，以改进已经针对大型标记数据进行训练的生成结构化模型。该方法可以很好地扩展到大量未标记数据，并且对于使用少量标记数据训练的模型，也可以显着减少错误（最多11.5％），从而使其也与具有小树库的低资源语言相关。

著录项

来源
《Journal of logic and computation》 |2014年第2期|433-453|共21页
作者
TEJASWINI DEOSKAR; MARKOS MYLONAKIS; KHALIL SIMAAN;
展开▼
作者单位

School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK;

Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, 1098 XH, The Netherlands;

Institute for Logic, Language and Computation, University of Amsterdam, Amsterdam, 1098 XH, The Netherlands;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semi-supervised EM; lexical learning; PCFG estimation; subcategorization;

机译：半监督EM词汇学习;PCFG估算;子类别;

相似文献

外文文献
中文文献
专利

1. Learning context-dependent word embeddings based on dependency parsing [J] . Ke Yan, Jie Chen, Wenhao Zhu, International journal of infomation technology and management . 2020,第4期

机译：基于依赖性解析学习上下文依赖词eMbedingings
2. Learning word dependencies in text by means of a deep recurrent belief network [J] . Chaturvedi Iti, Ong Yew-Soon, Tsang Ivor W., Knowledge-Based Systems . 2016,第sepa15期

机译：通过深度递归信念网络学习文本中的单词依存关系
3. Three-Word Dependency Relations and Their Application to Structural Ambiguity Resolution [J] . EDUARDO DE PAIVA AWES, TEIJI FURUGORI 情報処理学会論文誌 . 1999,第1期

机译：三词依存关系及其在结构歧义解析中的应用
4. Learning Structural Dependencies of Words in the Zipfian Tail [C] . Tejaswini Deoskar, Markos Mylonakis, Khalil Simaan 12th International conference on parsing technology 2011. . 2011

机译：学习Zipfian尾巴中单词的结构依赖性
5. Specific Structural Features of Child-directed Speech Support Young Children's Word Learning [D] . Schwab, Jessica Feigenbaum. 2018

机译：面向儿童的语音支持的特殊结构特征幼儿的单词学习
6. The effect of Zipfian frequency variations on category formation in adult artificial language learning [O] . Kathryn D. Schuler, Patricia A. Reeder, Elissa L. Newport, -1

机译：Zipfian频率变化对成人人工语言学习中类别形成的影响
7. Learning Structural Dependencies of Words in the Zipfian Tail [O] . Tejaswini Deoskar, Markos Mylonakis 2012

机译：学习Zipfian尾巴中单词的结构依赖性

Learning structural dependencies of words in the Zipfian Tail

摘要

著录项

相似文献

相关主题

期刊订阅