Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

机译：代理感知的辍学DQN，用于安全有效的在线对话策略学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hand-crafted rules and reinforcement learning (RL) are two popular choices to obtain dialogue policy. The rule-based policy is often reliable within predefined scope but not self-adaptable, whereas RL is evolvable with data but often suffers from a bad initial performance. We employ a companion learning framework to integrate the two approaches for on-line dialogue policy learning, in which a predefined rule-based policy acts as a teacher and guides a data-driven RL system by giving example actions as well as additional rewards. A novel agent-aware dropout Deep Q-Network (AAD-DQN) is proposed to address the problem of when to consult the teacher and how to learn from the teacher's experiences. AAD-DQN, as a data-driven student policy, provides (1) two separate experience memories for student and teacher, (2) an uncertainty estimated by dropout to control the timing of consultation and learning. Simulation experiments showed that the proposed approach can significantly improve both safety and efficiency of on-line policy optimization compared to other companion learning approaches as well as supervised pre-training using static dialogue corpus

机译：手工制定的规则和强化学习（RL）是获得对话策略的两种流行选择。基于规则的策略通常在预定义的范围内是可靠的，但不是自适应的，而RL可随着数据而发展，但通常会遭受不良的初始性能。我们采用一个伴侣学习框架来集成在线对话策略学习的两种方法，其中预定义的基于规则的策略充当教师，并通过给出示例动作以及额外的奖励来指导数据驱动的RL系统。为了解决何时向老师咨询以及如何从老师的经验中学习的问题，提出了一种新颖的可感知代理的深度Q网络（AAD-DQN）。 AAD-DQN作为数据驱动的学生政策，提供（1）为学生和老师提供的两个单独的体验记忆，（2）通过辍学估算的不确定性来控制咨询和学习的时间。仿真实验表明，与其他同伴学习方法以及使用静态对话语料的监督式预训练相比，该方法可以显着提高在线策略优化的安全性和效率。

著录项

来源
《Conference on empirical methods in natural language processing》|2017年|2444-2454|共11页
会议地点
作者
Lu Chen; Xiang Zhou; Cheng Chang; Runzhe Yang; Kai Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning [J] . Saha Tulika, Gupta Dhawal, Saha Sriparna, Expert Systems with Application . 2020,第Deca期

机译：利用分层深度加强学习对多个域和意图的综合对话政策学习
2. Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network [J] . Poernomo Alvin, Kang Dae-Ki Neural Networks: The Official Journal of the International Neural Network Society . 2018,第期

机译：偏见辍学和交叉图辍学：在卷积神经网络中了解有效的辍学正规化
3. A predictive approach based on efficient feature selection and learning algorithms' competition: Case of learners' dropout in MOOCs [J] . Mourdi Youssef, Sadgal Mohammed, El Kabtane Hamada, Education and information technologies . 2019,第6期

机译：一种基于有效特征选择和学习算法竞争的预测方法：MOOC中学习者辍学的案例
4. Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning [C] . Lu Chen, Xiang Zhou, Cheng Chang, Conference on empirical methods in natural language processing . 2017

机译：代理感知丢弃DQN，用于安全有效的在线对话策略学习
5. Sample Efficient DQN and Object Localization with Capsnet [D] . Liu, Weitang. 2019

机译：Capsnet的示例高效DQN和对象本地化
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning [O] . Lu Chen, Xiang Zhou, Cheng Chang, 2017

机译：代理感知丢弃DQN，用于安全有效的在线对话策略学习

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

摘要

著录项

相似文献

相关主题

期刊订阅