Improving Lemmatization of Non-Standard Languages with Joint Learning

机译：通过联合学习改善非标准语言的lemmatization

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lemmatization of standard languages is concerned with (ⅰ) abstracting over morphological differences and (ⅱ) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (ⅲ): spelling variation due to lacking orthographic standards. We approach lemmatization as a string-transduction task with an encoder-decoder architecture which we enrich with sentence context information using a hierarchical sentence encoder. We show significant improvements over the state-of-thc-art when training the sentence encoder jointly for lemmatization and language modeling. Crucially, our architecture does not require POS or morphological annotations, which are not always available for historical corpora. Additionally, we also test the proposed model on a set of typolog-ically diverse standard languages showing results on par or better than a model without enhanced sentence representations and previous state-of-the-art systems. Finally, to encourage future work on processing of non-standard varieties, we release the dataset of non-standard languages underlying the present study, based on openly accessible sources.

机译：标准语言的词形还原涉及（ⅰ）提炼过的形态差异和（ⅱ）的解决屈折话令牌引理模糊度，以便将它们映射到一个词典词条。在本论文中，我们的目标是提高在一组的非标准历史语言中的难度增加了一个另外的方面（ⅲ）词形还原性能：拼写变化，由于缺乏正投影标准。我们接近词形还原与我们使用分层编码一句句子的上下文信息丰富的编码器，解码器架构的字符串转任务。我们发现在国家的THC-艺术显著改进联合训练一句编码器词形还原和语言模型时。最重要的是，我们的架构不需要POS或形态的注释，这并不总是可用的历史语料库。此外，我们还测试了一套显示了不加重刑罚的表示和以前的国家的最先进的系统看齐或比模型更好的结果typolog-ically不同标准的语言提出的模型。最后，为了鼓励非标加工品种今后的工作中，我们释放非标准语言基础的本研究的数据集的基础上，公开查阅的来源。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|xciii p. 1401-2101|共11页
会议地点
作者
Enrique Manjavacas; Akos Kadar; Mike Kestemont;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Lemmatization for variation-rich languages using deep learning [J] . Kestemont Mike, de Pauw Guy, van Nie Renske, Literary & linguistic computing . 2017,第4期

机译：使用深度学习对变异丰富的语言进行词法化
2. Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning [J] . Murthy Rudra, Khapra Mitesh M., Bhattacharyya Pushpak ACM transactions on Asian language information processing . 2019,第2期

机译：通过多语言学习提高低资源语言中的NER标签性能
3. Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages [J] . Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh Journal of Language Modelling . 2019,第2期

机译：学习跨语言的语音和拼字法适应：改进低资源语言之间的神经机器翻译的案例研究
4. Improving Lemmatization of Non-Standard Languages with Joint Learning [C] . Enrique Manjavacas, Akos Kadar, Mike Kestemont Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：通过联合学习提高非标准语言的合法化
5. Parent Strategies for Improving Joint Engagement and Language in a Diverse Sample of Limited Language Preschoolers with Autism Spectrum Disorder [D] . Toolan, Christina Kang. 2020

机译：在具有自闭症谱系障碍的有限语言学龄前儿童的多样化样本中提高联合参与和语言的父母策略
6. Improving outcomes of preschool language delay in the community: protocol for the Language for Learning randomised controlled trial [O] . Melissa Wake, Penny Levickis, Sherryn Tobin, 2012

机译：改善社区中学前语言延迟的结果：学习语言协议随机对照试验
7. Improving Lemmatization of Non-Standard Languages with Joint Learning [O] . Enrique Manjavacas, Ákos Kádár, Mike Kestemont 2019

机译：通过联合学习改善非标准语言的lemmatization

Improving Lemmatization of Non-Standard Languages with Joint Learning

摘要

著录项

相似文献

相关主题

期刊订阅