首页> 外文会议>Language and technology conference >A First LVCSR System for Luxembourgish, a Low-Resourced European Language
【24h】

A First LVCSR System for Luxembourgish, a Low-Resourced European Language

机译:第一个用于卢森堡语的LVCSR系统,一种资源贫乏的欧洲语言

获取原文

摘要

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe's low-resourced languages. We describe our efforts in building a large vocabulary ASR system for such a "minority" language without resorting to any prior transcribed audio training data. Instead, acoustic models are derived from major European languages. Furthermore, most Luxembourgish written sources include significant parts in other languages. This poses specific challenges to Language Model estimation. Some scientific and technological issues addressed include: (ⅰ) how to build acoustic models if no labeled acoustic training data are available for the under-resourced target language? (ⅱ) how to make use of the new system to accelerate resource production for the target language? (ⅲ) how to build a vocabulary and a language model with multilingual written texts? (ⅳ) how to determine the "best" phonemic inventory for ASR? First ASR results illustrate the accuracy of the various sets of monolingual and multilingual acoustic models and what these suggest concerning language typology issues.
机译:卢森堡语是在罗曼与日耳曼文化之间的差异中嵌入多种语言的环境,并且仍然是欧洲资源贫乏的语言之一。我们描述了在不诉诸任何先前转录的音频培训数据的情况下,为这种“少数民族”语言建立大型词汇ASR系统的努力。取而代之的是,声学模型是从主要的欧洲语言衍生而来的。此外,大多数卢森堡的书面资料都包含其他语言的重要部分。这给语言模型估计带来了特殊的挑战。解决的一些科学技术问题包括:(ⅰ)如果资源不足的目标语言没有可用的标注的声学训练数据,如何建立声学模型? (ⅱ)如何利用新系统加快目标语言的资源生产? (ⅲ)如何用多语种书面材料建立词汇和语言模型? (ⅳ)如何确定ASR的“最佳”音素库?最初的ASR结果说明了各种单语言和多语言声学模型的准确性,以及这些模型对语言类型问题的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号