首页> 外文期刊>Computer speech and language >Code-switched automatic speech recognition in five South African languages
【24h】

Code-switched automatic speech recognition in five South African languages

机译:五种南非语言中的代码切换自动语音识别

获取原文
获取原文并翻译 | 示例
       

摘要

Most automatic speech recognition (ASR) systems are optimised for one specific language and their performance consequently deteriorates drastically when confronted with multilingual or code-switched speech. We describe our efforts to improve an ASR system that can process code-switched South African speech that contains English and four indigenous languages: isiZulu, isiXhosa, Sesotho and Setswana. We begin using a newly developed language-balanced corpus of code-switched speech compiled from South African soap operas, which are rich in spontaneous code-switching. The small size of the corpus makes this scenario under-resourced, and hence we explore several ways of addressing this sparsity of data. We consider augmenting the acoustic training sets with in-domain data at the expense of making it unbalanced and dominated by English. We further explore the inclusion of monolingual out-of-domain data in the constituent languages. For language modelling, we investigate the inclusion of out-of-domain text data sources and also the inclusion of synthetically-generated code-switch bigrams. In our experiments, we consider two system architectures. The first considers four bilingual speech recognisers, each allowing code-switching between English and one of the indigenous languages. The second considers a single pentalingual speech recogniser able to process switching between all five languages. We find that the additional inclusion of each acoustic and text data source leads to some improvements. While in-domain data is substantially more effective, performance gains were also achieved using out-of-domain data, which is often much easier to obtain. We also find that improvements are achieved in all five languages, even when the training set becomes unbalanced and heavily skewed in favour of English. Finally, we find the use of TDNN-F architectures for the acoustic model to consistently outperform TDNN-BLSTM models in our data-sparse scenario.
机译:大多数自动语音识别(ASR)系统针对一种特定语言进行了优化,因此在面对多语言或代码切换语音时,它们的性能因此恶化地恶化。我们描述了改进ASR系统的努力,该系统可以处理包含英语和四种土着语言的代码切换南非演讲:Isizulu,Isixhosa,Sesotho和Setswana。我们开始使用从南非肥皂剧编译的代码的新开发的语言平衡语料库,这些语音均富有自发性代码切换。小型尺寸的语料库使这种情况进行了资源,因此我们探索了解决数据稀疏性的几种方式。我们考虑使用域名数据增强声学训练集,以牺牲域名数据为代价,使其不平衡并以英语为主。我们进一步探索了组成语言中的单声道域外数据。对于语言建模,我们调查包含域外文本数据源以及包含合成生成的代码交换机BIGRAM。在我们的实验中,我们考虑了两个系统架构。第一个考虑四种双语语音识别人员,每个都允许英语与土着语言之一之间的代码切换。第二个考虑一个能够在所有五种语言之间处理切换的单一渠道语音识别器。我们发现每个声学和文本数据源的额外包含导致一些改进。虽然域名数据基本上更有效,但使用域名数据也实现了性能增益,这通常更容易获得。我们还发现,即使训练集变得不平衡和极致的英语,也可以通过五种语言实现改进。最后,我们发现使用TDNN-F架构对声学模型,在我们的数据稀疏方案中始终如一地优于TDNN-BLSTM模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号