Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

机译：没有平行数据的跨语言符号嵌入的无监督学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.

机译：我们为多语言符号（例如角色）嵌入的无监督学习提供了一种新方法，没有任何并行数据或关于语言之间的对应的知识。它能够利用符号在其语言中的符号内使用的发行语言之间的语言之间的相似性，即使在没有两种语言的任何符号的情况下也是如此。在具有人工损坏的文本语料库的实验中，我们表明该方法可以检索因噪声而模糊的字符对应关系。然后，我们展示令人鼓舞的结果将方法应用于真正的语言数据，包括低资源语言。学习的陈述开辟了在低资源语言中完全无监督的对比较研究的可能性，没有关于其符号集的现有知识。

著录项

来源
《Annual meeting of the Society for Computation in Linguistics》|2019年|ii 297 p.|共10页
会议地点
作者
Mark Granroth-Wilding; Hannu Toivonen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 23:53:09

相似文献

外文文献
中文文献
专利

1. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
2. Unsupervised group matching with application to cross-lingual topic matching without alignment information [J] . Iwata Tomoharu, Kanagawa Motonobu, Hirao Tsutomu, Data mining and knowledge discovery . 2017,第2期

机译：无监督的组与应用程序匹配，在没有对齐信息的情况下匹配的跨语言主题
3. Unsupervised Active Learning of CRF Model for Cross-Lingual Information Extraction [J] . Mohamed Farouk Abdel Hady, Abubakrelsedik Karali, Eslam Kamal, International journal of computational linguistics and applications . 2014,第2期

机译：跨语言信息提取的CRF模型的无监督主动学习
4. Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data [C] . Mark Granroth-Wilding, Hannu Toivonen Second annual meeting of the Society for Computation in Linguistics . 2019

机译：没有并行数据的跨语言符号嵌入的无监督学习
5. Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings [D] . Kilinc, Ismail Ozsel. 2017

机译：半监督和非监督设置的神经网络中基于图的潜在嵌入，注释和表示学习
6. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning [O] . Jiayi Wu, Yong-Bei Ma, Charles Congdon, -1

机译：通过统计流形学习进行大规模并行无监督单粒子低温电磁数据聚类
7. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data [O] . Haitham Ashoor, Xiaowen Chen, Wojciech Rosikiewicz, 2020

机译：图形嵌入和无监督学习预测HIC染色质交互数据的基因组子隔室

Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

摘要

著录项

相似文献

相关主题

期刊订阅