首页> 外文期刊>Computer speech and language >Multilingual and unsupervised subword modeling for zero-resource languages
【24h】

Multilingual and unsupervised subword modeling for zero-resource languages

机译:零资源语言的多语言和无人监督子字建模

获取原文
获取原文并翻译 | 示例
       

摘要

Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused unsupervised learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features.
机译:零资源语言的子字建模旨在学习语音音频的低级表示,而无需使用目标语言(例如文本语料库或发音词典)的转录或其他资源。良好的代表应该捕捉语音内容和抽象远离其他类型的可变性,例如扬声器差异和信道噪声。此领域的以前的工作主要专注于无监督从目标语言数据的学习,并且已仅在本质上进行评估。在这里,我们直接比较多种方法,包括仅使用目标语言语音数据的方法,以及一些使用来自其他(非目标)语言的转录语音的一些方法,并且我们使用两个内在措施以及在下游无监督的单词分段和聚类任务中进行评估。我们发现,组合两个现有的目标语言方法,仅产生比单独方法更好的特征。然而,即使使用在其他语言上培训的模型提取目标语言瓶颈特征也可以获得更好的结果。只使用另一种语言的交叉语言训练足以提供这种好处,但多语种培训有助于更多。除了这些结果,它还涉及内在措施和外在任务,我们讨论了不同类型的学习功能之间的定性差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号