首页> 外文期刊>Computer speech and language >Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory
【24h】

Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory

机译:基于最优理论的Central Kurdish自动化石墨 - 音素转换

获取原文
获取原文并翻译 | 示例
       

摘要

The writing system of Central Kurdish features three cases in which there is no one-to-one mapping between the orthographical letters and the phonemes of the language. Consequently, the written words including these cases may be pronounced in multiple ways. The process of finding the correct pronunciation of written words is called Grapheme-to-Phoneme (G2P) conversion and is a key step in natural language processing tasks such as speech synthesis. As Central Kurdish is a low-resourced language, we present a G2P conversion method based on the phonological rules of the language, rather than pronunciation dictionaries and data-driven learning methods. After reviewing the phonology and alphabet of the language through the framework of Optimality Theory, we generate all possible pronunciations. Then, by specifying and applying ranked constraints, we eliminate undesirable candidates so as to keep only one well-formed pronunciation per word. The evaluation of our proposed method on two datasets resulted in 0.75% of overall Phoneme Error Rate (PER) and achieved 94.71% precision in the detection of the short vowel /i/ and 100% of accuracy in the conversion of the letters "s" and",". Analyzing these results suggests that there is no need for additional new letters in the current orthographic system of Central Kurdish. This approach also enables us to have a ranked suggestion list for the manual checking of the few unresolved ambiguous situations.
机译:Central Kurdish的写作系统具有三种案例,其中在语言的拼音和音素之间没有一对一的映射。因此,包括这些情况的书面单词可能以多种方式发音。查找正确发音书写单词的过程称为GraphEme-to-Phoneme(G2P)转换,是语言处理任务等自然语言处理任务的关键步骤。由于中央库尔德是一种低资源的语言,我们提出了一种基于语言语音规则的G2P转换方法,而不是发音词典和数据驱动的学习方法。通过最优理论框架审查语言的音韵和字母表之后,我们会生成所有可能的发音。然后,通过指定和应用排名的约束,我们消除了不良候选者,以便仅保持每个单词的一个良好的发音。对两个数据集的建议方法的评估导致总音素错误率的0.75%(每),并在检测到短元音/ I / I /和100%的准确性在转换字母“S”时实现了94.71%的精确度和”,”。分析这些结果表明,中央库尔德的当前正交系统中不需要额外的新信件。这种方法还使我们能够有一个排名的建议列表,用于手动检查少数未解决的含糊不清的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号