Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training

机译：了解LSTM和低资源培训的互联网语音数据款式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous work, utterance-level statistics of acoustic features are utilized as features in classifying speaking styles, ignoring the local context information. Long short-term memory (LSTM) recurrent neural network (RNN) has achieved exciting success in lots of research areas, such as speech recognition. It is able to retrieve context information for long time duration, which is important in characterizing speaking styles. To train LSTM, huge number of labeled training data is required. While for the scenario of internet speech data classification, it is quite difficult to get such large scale labeled data. On the other hand, we can get some publicly available data for other tasks (such as speech emotion recognition), which offers us a new possibility to exploit LSTM in the low-resource task. We adopt retraining strategy to train LSTM to recognize speaking styles in speech data by training the network on emotion and speaking style datasets sequentially without reset the weights of the network. Experimental results demonstrate that retraining improves the training speed and the accuracy of network in speaking style classification.

机译：言语被广泛用于表达社交网络通信中的一个人的情感，意图，欲望等，从而充满了丰富的互联网语音数据。这些数据为社交多媒体研究提供了良好的资源。然而，关于不同风格在互联网语音数据中混合在一起，如何对这些数据进行分类仍然是一个具有挑战性的问题。在以前的工作中，声学特征的话语级别统计数据在分类样式的分类中使用，忽略本地上下文信息。长期内记忆（LSTM）经常性神经网络（RNN）在许多研究领域（如语音识别）取得了令人兴奋的成功。它能够在长时间持续时间检索上下文信息，这在表征讲话方式方面很重要。要训练LSTM，需要大量标记的培训数据。虽然对于互联网语音数据分类的场景，但很难获得如此大规模的标记数据。另一方面，我们可以为其他任务（例如语音情感识别）获得一些公开的数据，为我们提供了利用低资源任务利用LSTM的新可能性。我们采用培训策略来训练LSTM通过在情绪上顺序训练网络和说话的样式数据集来识别语音数据中的讲话方式，而不会重置网络权重。实验结果表明，培训改善了讲台风格分类中网络的训练速度和准确性。

著录项

来源
《International Conference on Affective Computing and Intelligent Interaction》|2015年||共6页
会议地点
作者
Xixin Wu; Zhiyong Wu; Yishuang Ning; Jia Jia; Lianhong Cai; Helen Meng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Speaking style; Long short-term memory; Recurrent neural network; Retraining;

机译：说话风格;短期内记忆;经常性神经网络;再培训;

相似文献

外文文献
中文文献
专利

1. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [J] . Park Soo Jin, Yeung Gary, Vesselinova Neda, The Journal of the Acoustical Society of America . 2018,第1期

机译：了解人类和机器的扬声器歧视能力，为不同语音样式无关的短语
2. Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations [J] . Kazunori KOMATANI, Mikio NAKANO, Masaki KATSUMARU, IEICE transactions on information and systems . 2012,第9期

机译：基于多种模型组合的语音理解训练数据的自动分配
3. Automatic Allocation of Training Data for Speech Understanding Based on Multiple Model Combinations [J] . Kazunori KOMATANI, Mikio NAKANO, Masaki KATSUMARU, IEICE Transactions on Information and Systems . 2012,第9期

机译：基于多种模型组合的语音理解训练数据的自动分配
4. Understanding speaking styles of internet speech data with LSTM and low-resource training [C] . Wu Xixin, Wu Zhiyong, Ning Yishuang, 2015 International Conference on Affective Computing and Intelligent Interaction . 2015

机译：借助LSTM和低资源培训来了解互联网语音数据的说话风格
5. Text-to-Speech Synthesis Using Found Data for Low-Resource Languages [D] . Cooper, Erica 2019

机译：使用低资源语言的数据对文本进行语音合成
6. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [O] . Soo Jin Park, Gary Yeung, Neda Vesselinova, -1

机译：旨在理解人和机器中说话者的辨别能力以实现不同语音风格的与文本无关的简短发声
7. Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition [O] . Kohei Matsuura, Masato Mimura, Shinsuke Sakai, 2020

机译：生成的逆境培训数据适应非常低资源的自动语音识别
8. Effects of Feature Type, Learning Algorithm and Speaking Style for Depression Detection from Speech. [R] . Mitra, V., Shriberg, E. 2015

机译：特征类型，学习算法和语音风格对语音抑郁检测的影响。

Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training

摘要

著录项

相似文献

相关主题

期刊订阅