首页> 外文会议>Annual meeting of the Society for Computation in Linguistics >Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data
【24h】

Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

机译:没有平行数据的跨语言符号嵌入的无监督学习

获取原文

摘要

We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.
机译:我们为多语言符号(例如角色)嵌入的无监督学习提供了一种新方法,没有任何并行数据或关于语言之间的对应的知识。它能够利用符号在其语言中的符号内使用的发行语言之间的语言之间的相似性,即使在没有两种语言的任何符号的情况下也是如此。在具有人工损坏的文本语料库的实验中,我们表明该方法可以检索因噪声而模糊的字符对应关系。然后,我们展示令人鼓舞的结果将方法应用于真正的语言数据,包括低资源语言。学习的陈述开辟了在低资源语言中完全无监督的对比较研究的可能性,没有关于其符号集的现有知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号