首页> 外文期刊>Computer speech and language >Analysis of gender and identity issues in depression detection on de-identified speech
【24h】

Analysis of gender and identity issues in depression detection on de-identified speech

机译:抑郁症检测中的性别和身份问题分析

获取原文
获取原文并翻译 | 示例
           

摘要

Research in the area of automatic monitoring of emotional state from speech permits envisaging future novel applications for the remote monitoring of some common mental disorders, such as depression. However, these tools raise some privacy concerns since speech is sent via telephone or the Internet, and it is further stored or processed in remote servers. Speaker de-identification can be used to protect the privacy of these patients, but this procedure might affect the ability to perceive the disease when using automatic depression detection approaches. It is also important that the resulting de-identified speech has enough quality since practitioners may need to listen to the recordings to assess the patients' state. This paper performs an extensive analysis of depression detection from de-identified speech using different de-identification approaches based on voice conversion. In previous work, a de-identification technique based on pretrained transformation functions was assessed in the context of depression detection. That strategy is speaker-independent (i.e. not speaker-specific) and gender-independent (i.e. the gender of the speaker is not necessarily preserved), which makes it possible to implement it in a real-world scenario where no parallel training data is required between input and source speakers. This paper aims at analyzing different aspects of the aforementioned speaker de-identification approach in a depression detection scenario: 1) compare the performance of the proposed speaker-independent technique with a speaker-dependent setting where parallel data between input and source speaker are available; 2) analyze how this system behaves when the gender of the speaker is preserved, since this might be a desirable feature and has not been addressed in previous work; 3) assess the performance of two different voice conversion methods in a setting where a limited amount of training data is available; specifically de-identification based on frequency warping and amplitude scaling (FW+AS) was compared with a strategy based on generative adversarial networks (GAN). Experimental validation was carried out in the framework of the Audio/Visual Emotion Challenge 2014, and the results suggest that speaker-independent and gender-dependent de-identification is the most suitable option for depression level estimation since the trade-off between de-identification and depression estimation performances was superior to the other alternatives. In addition, the results suggest that the de-identification approach based on GAN achieves better de-identification performance than FW+AS while achieving comparable results for depression detection.
机译:来自语音的情绪状态的自动监测领域的研究允许预测未来的新颖应用,以远程监测一些常见的精神障碍,例如抑郁症。但是,这些工具提高了一些隐私问题,因为通过电话或互联网发送了语音,并且在远程服务器中进一步存储或处理。扬声器去识别可用于保护这些患者的隐私,但这种程序可能会影响使用自动抑郁检测方法时感知疾病的能力。由于从业者可能需要倾听录音以评估患者的状态,因此也必须获得足够的质量。本文对使用基于语音转换的不同去识别方法进行了广泛的抑郁检测抑郁检测。在先前的工作中,在抑郁检测的背景下评估基于预训过的转化功能的去识别技术。该策略是扬声器 - 独立(即不是发言者特定)和性别无关(即扬声器的性别不一定保留),这使得它可以在现实世界场景中实施,其中不需要并行培训数据输入和源代言人之间。本文旨在分析抑郁症检测场景中上述扬声器去识别方法的不同方面:1)比较所提出的扬声器独立技术的性能与扬声器依赖的设置,其中输入和源扬声器之间的并行数据可用; 2)分析当扬声器的性别保留时,该系统的行为如何行事,因为这可能是一个理想的功能,并且在以前的工作中尚未解决; 3)评估两种不同的语音转换方法的性能在可用数量有限的培训数据中;比较基于频率翘曲和幅度缩放(FW + AS)的去识别与基于生成的对抗网络(GAN)的策略进行比较。实验验证是在2014年音频/视觉情感挑战赛的框架内进行的,结果表明,自去识别之间的权衡以来,扬声器无关和性别依赖的去识别是抑郁级估计的最合适的选择抑郁估计表现优于其他替代品。此外,结果表明,基于GaN的去识别方法比FW +实现更好的去识别性能,同时实现抑郁检测的可比结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号