首页> 外文会议>Conference on empirical methods in natural language processing >Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning
【24h】

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

机译:代理感知的辍学DQN,用于安全有效的在线对话策略学习

获取原文

摘要

Hand-crafted rules and reinforcement learning (RL) are two popular choices to obtain dialogue policy. The rule-based policy is often reliable within predefined scope but not self-adaptable, whereas RL is evolvable with data but often suffers from a bad initial performance. We employ a companion learning framework to integrate the two approaches for on-line dialogue policy learning, in which a predefined rule-based policy acts as a teacher and guides a data-driven RL system by giving example actions as well as additional rewards. A novel agent-aware dropout Deep Q-Network (AAD-DQN) is proposed to address the problem of when to consult the teacher and how to learn from the teacher's experiences. AAD-DQN, as a data-driven student policy, provides (1) two separate experience memories for student and teacher, (2) an uncertainty estimated by dropout to control the timing of consultation and learning. Simulation experiments showed that the proposed approach can significantly improve both safety and efficiency of on-line policy optimization compared to other companion learning approaches as well as supervised pre-training using static dialogue corpus
机译:手工制定的规则和强化学习(RL)是获得对话策略的两种流行选择。基于规则的策略通常在预定义的范围内是可靠的,但不是自适应的,而RL可随着数据而发展,但通常会遭受不良的初始性能。我们采用一个伴侣学习框架来集成在线对话策略学习的两种方法,其中预定义的基于规则的策略充当教师,并通过给出示例动作以及额外的奖励来指导数据驱动的RL系统。为了解决何时向老师咨询以及如何从老师的经验中学习的问题,提出了一种新颖的可感知代理的深度Q网络(AAD-DQN)。 AAD-DQN作为数据驱动的学生政策,提供(1)为学生和老师提供的两个单独的体验记忆,(2)通过辍学估算的不确定性来控制咨询和学习的时间。仿真实验表明,与其他同伴学习方法以及使用静态对话语料的监督式预训练相比,该方法可以显着提高在线策略优化的安全性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号