首页> 外文会议>International Conference on Affective Computing and Intelligent Interaction >Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training
【24h】

Understanding Speaking Styles of Internet Speech Data with LSTM and Low-resource Training

机译:了解LSTM和低资源培训的互联网语音数据款式

获取原文

摘要

Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous work, utterance-level statistics of acoustic features are utilized as features in classifying speaking styles, ignoring the local context information. Long short-term memory (LSTM) recurrent neural network (RNN) has achieved exciting success in lots of research areas, such as speech recognition. It is able to retrieve context information for long time duration, which is important in characterizing speaking styles. To train LSTM, huge number of labeled training data is required. While for the scenario of internet speech data classification, it is quite difficult to get such large scale labeled data. On the other hand, we can get some publicly available data for other tasks (such as speech emotion recognition), which offers us a new possibility to exploit LSTM in the low-resource task. We adopt retraining strategy to train LSTM to recognize speaking styles in speech data by training the network on emotion and speaking style datasets sequentially without reset the weights of the network. Experimental results demonstrate that retraining improves the training speed and the accuracy of network in speaking style classification.
机译:言语被广泛用于表达社交网络通信中的一个人的情感,意图,欲望等,从而充满了丰富的互联网语音数据。这些数据为社交多媒体研究提供了良好的资源。然而,关于不同风格在互联网语音数据中混合在一起,如何对这些数据进行分类仍然是一个具有挑战性的问题。在以前的工作中,声学特征的话语级别统计数据在分类样式的分类中使用,忽略本地上下文信息。长期内记忆(LSTM)经常性神经网络(RNN)在许多研究领域(如语音识别)取得了令人兴奋的成功。它能够在长时间持续时间检索上下文信息,这在表征讲话方式方面很重要。要训​​练LSTM,需要大量标记的培训数据。虽然对于互联网语音数据分类的场景,但很难获得如此大规模的标记数据。另一方面,我们可以为其他任务(例如语音情感识别)获得一些公开的数据,为我们提供了利用低资源任务利用LSTM的新可能性。我们采用培训策略来训练LSTM通过在情绪上顺序训练网络和说话的样式数据集来识别语音数据中的讲话方式,而不会重置网络权重。实验结果表明,培训改善了讲台风格分类中网络的训练速度和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号