Code-switched automatic speech recognition in five South African languages

Astik Biswas; Emre Yilmaz; Ewald van der Westhuizen; Febe de Wet; Thomas Niesler

首页> 外文期刊>Computer speech and language >Code-switched automatic speech recognition in five South African languages

【24h】

Code-switched automatic speech recognition in five South African languages

机译：五种南非语言中的代码切换自动语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most automatic speech recognition (ASR) systems are optimised for one specific language and their performance consequently deteriorates drastically when confronted with multilingual or code-switched speech. We describe our efforts to improve an ASR system that can process code-switched South African speech that contains English and four indigenous languages: isiZulu, isiXhosa, Sesotho and Setswana. We begin using a newly developed language-balanced corpus of code-switched speech compiled from South African soap operas, which are rich in spontaneous code-switching. The small size of the corpus makes this scenario under-resourced, and hence we explore several ways of addressing this sparsity of data. We consider augmenting the acoustic training sets with in-domain data at the expense of making it unbalanced and dominated by English. We further explore the inclusion of monolingual out-of-domain data in the constituent languages. For language modelling, we investigate the inclusion of out-of-domain text data sources and also the inclusion of synthetically-generated code-switch bigrams. In our experiments, we consider two system architectures. The first considers four bilingual speech recognisers, each allowing code-switching between English and one of the indigenous languages. The second considers a single pentalingual speech recogniser able to process switching between all five languages. We find that the additional inclusion of each acoustic and text data source leads to some improvements. While in-domain data is substantially more effective, performance gains were also achieved using out-of-domain data, which is often much easier to obtain. We also find that improvements are achieved in all five languages, even when the training set becomes unbalanced and heavily skewed in favour of English. Finally, we find the use of TDNN-F architectures for the acoustic model to consistently outperform TDNN-BLSTM models in our data-sparse scenario.

机译：大多数自动语音识别（ASR）系统针对一种特定语言进行了优化，因此在面对多语言或代码切换语音时，它们的性能因此恶化地恶化。我们描述了改进ASR系统的努力，该系统可以处理包含英语和四种土着语言的代码切换南非演讲：Isizulu，Isixhosa，Sesotho和Setswana。我们开始使用从南非肥皂剧编译的代码的新开发的语言平衡语料库，这些语音均富有自发性代码切换。小型尺寸的语料库使这种情况进行了资源，因此我们探索了解决数据稀疏性的几种方式。我们考虑使用域名数据增强声学训练集，以牺牲域名数据为代价，使其不平衡并以英语为主。我们进一步探索了组成语言中的单声道域外数据。对于语言建模，我们调查包含域外文本数据源以及包含合成生成的代码交换机BIGRAM。在我们的实验中，我们考虑了两个系统架构。第一个考虑四种双语语音识别人员，每个都允许英语与土着语言之一之间的代码切换。第二个考虑一个能够在所有五种语言之间处理切换的单一渠道语音识别器。我们发现每个声学和文本数据源的额外包含导致一些改进。虽然域名数据基本上更有效，但使用域名数据也实现了性能增益，这通常更容易获得。我们还发现，即使训练集变得不平衡和极致的英语，也可以通过五种语言实现改进。最后，我们发现使用TDNN-F架构对声学模型，在我们的数据稀疏方案中始终如一地优于TDNN-BLSTM模型。

著录项

来源
《Computer speech and language》 |2022年第1期|101262.1-101262.20|共20页
作者
Astik Biswas; Emre Yilmaz; Ewald van der Westhuizen; Febe de Wet; Thomas Niesler;
展开▼
作者单位

Department of Electrical and Electronic Eng. Stellenbosch University South Africa;

Department of Electrical and Computer Eng. National University of Singapore Singapore;

Department of Electrical and Electronic Eng. Stellenbosch University South Africa;

Department of Electrical and Electronic Eng. Stellenbosch University South Africa;

Department of Electrical and Electronic Eng. Stellenbosch University South Africa;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Code-switching; Under-resourced languages; African languages; Bantu languages; Speech recognition; TDNN-BLSTM; TDNN-F;

机译：代码切换;资源不足的语言;非洲语言;BANTU语言;语音识别;TDNN-BLSTM;TDNN-F.;

相似文献

外文文献
中文文献
专利

1. Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas [J] . Ewald van der Westhuizen, Thomas Niesler Procedia Computer Science . 2016,第22期

机译：南非肥皂剧中英语-西祖鲁语代码转换语音的自动语音识别
2. Automatic Speech Recognition for African Languages with Vowel Length Contrast [J] . Elodie Gauthier, Laurent Besacier, Sylvie Voisin Procedia Computer Science . 2016,第1期

机译：具有元音长度对比的非洲语言的自动语音识别
3. Collecting and evaluating speech recognition corpora for 11 South African languages [J] . Jaco Badenhorst, Charl van Heerden, Marelie Davel, Language Resources and Evaluation . 2011,第3期

机译：收集和评估11种南非语言的语音识别语料库
4. Evaluating Open-source Toolkits for Automatic Speech Recognition of South African Languages [C] . Ashentha Naidoo, Mohohlo Tsoeu Southern African Universities Power Engineering Conference;Robotics and Mechatronics;Pattern Recognition Association of South Africa . 2019

机译：评估用于南非语言自动语音识别的开源工具包
5. Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages [D] . Morris, Ethan. 2021

机译：用于低资源和形态复杂语言的自动语音识别
6. Are South African Speech-Language Therapists adequately equipped to assess English Additional Language (EAL) speakers who are from an indigenous linguistic and cultural background? A profile and exploration of the current situation [O] . Thandeka Mdlalo, Penelope Flack, Robin Joubert 2016

机译：南非言语治疗师是否具备足够的能力来评估来自本地语言和文化背景的英语额外语言（EAL）讲者？现状概况与探索
7. Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas [O] . van der Westhuizen Ewald, Niesler Thomas 2016

机译：南非肥皂剧中英语-西祖鲁语代码转换语音的自动语音识别

Code-switched automatic speech recognition in five South African languages

摘要

著录项

相似文献

相关主题

期刊订阅