Dialect Classification From a Single Sonorant Sound Using Deep Neural Networks

Themistocleous Charalambos

摘要

During spoken communication, the fine acoustic properties of human speech can reveal vital sociolinguistic and linguistic information about speakers and thus, these properties can function as reliable identification markers of speakers’ identity. One key information speech reveals is speakers' dialect. The first aim of this study is to provide a machine learning method that can distinguish the dialect from acoustic productions of sonorant sounds. The second aim is to determine the classification accuracy of dialects from the temporal and spectral information of a single sonorant sound and the classification accuracy of dialects using additional co-articulatory information from the adjacent vowel. To this end, this paper provides two classification approaches. The first classification approach aims to distinguish two Greek dialects, namely Athenian Greek, the prototypical form of Standard Modern Greek and Cypriot Greek using measures of temporal and spectral information (i.e., spectral moments) from four sonorant consonants /m n l r/. The second classification study aims to distinguish the dialects using coarticulatory information (e.g., formants frequencies F1-F5, F0, etc.) from the adjacent vowel in addition to spectral and temporal information from sonorants. In both classification approaches, we have employed Deep Neural Networks, which we compared with Support Vector Machines, Random Forests, and Decision Trees. The findings show that neural networks distinguish the two dialects using a combination of spectral moments, temporal information, and formant frequency information with 81% classification accuracy, which is a 14% accuracy gain over employing temporal properties and spectral moments alone. In conclusion, Deep Neural Networks can classify the dialect from single consonant productions making them capable of identifying sociophonetic shibboleths.

机译：在口头通信期间，人类演讲的精细声学属性可以揭示关于扬声器的重要社会语言和语言信息，因此，这些属性可以用作扬声器身份的可靠识别标记。一个关键信息言论显示是扬声器的方言。本研究的首次目的是提供一种机器学习方法，可以将方言与声音的声学制作区分开。第二个目的是从单个声音声音的时间和光谱信息和使用来自相邻元音的附加的共铰接信息的方言的分类精度来确定方言的分类准确性。为此，本文提供了两种分类方法。第一批分类方法旨在使用来自四个声音辅音/ M n l r /的时间和光谱信息（即，光谱矩）的措施来区分两个希腊语方言，即标准现代希腊和塞浦路斯希腊语的原型形式，即标准的现代希腊和塞浦路斯希腊语。第二种分类研究旨在将方言与来自相邻元音的共同元音除了来自SONORANTANT的频谱和时间信息之外的外观信息（例如，制剂频率F1-F5，F0等）的方言。在这两种分类方法中，我们都采用了深度神经网络，我们与支持向量机，随机森林和决策树进行比较。结果表明，神经网络使用具有81 ％的分类精度的频谱矩，时间信息和格式频率信息的组合来区分两个方言，这是采用时间特性和单独的频谱矩的14％的精度增益。总之，深神经网络可以将方言从单一辅助制品中分类，使得它们能够识别社会电影障碍。

Dialect Classification From a Single Sonorant Sound Using Deep Neural Networks

摘要

著录项

相关主题

期刊订阅