Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

机译：指导对话框策略学习：针对多域任务导向对话框的奖励估算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dialog policy decides what and how a task-oriented dialog system will respond, and plays a vital role in delivering effective conversations. Many studies apply Reinforcement Learning to learn a dialog policy with the reward function which requires elaborate design and pre-specified user goals. With the growing needs to handle complex goals across multiple domains, such manually designed reward functions are not affordable to deal with the complexity of real-world tasks. To this end, we propose Guided Dialog Policy Learning, a novel algorithm based on Adversarial Inverse Reinforcement Learning for joint reward estimation and policy optimization in multi-domain task-oriented dialog. The proposed approach estimates the reward signal and infers the user goal in the dialog sessions. The reward estimator evaluates the state-action pairs so that it can guide the dialog policy at each dialog turn. Extensive experiments on a multi-domain dialog dataset show that the dialog policy guided by the learned reward function achieves remarkably higher task success than state-of-the-art baselines.

机译：对话策略决定了面向任务的对话系统将如何响应以及如何响应，并且在传递有效的对话中起着至关重要的作用。许多研究都使用强化学习来学习具有奖励功能的对话策略，这需要精心设计和预先指定的用户目标。随着处理跨多个领域的复杂目标的需求不断增长，这种手动设计的奖励功能已经无法承受现实任务的复杂性。为此，我们提出了“引导对话策略学习”，这是一种基于对抗逆强化学习的新颖算法，用于在面向多领域任务的对话中进行联合奖励估计和策略优化。所提出的方法估计奖励信号并推断对话会话中的用户目标。奖励估算器评估状态操作对，以便它可以在每次对话时指导对话策略。在多域对话框数据集上进行的大量实验表明，由学习的奖励功能指导的对话框策略比最新的基线能够显着提高任务成功率。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|100-110|共11页
会议地点 Hong Kong(CN)
作者
Ryuichi Takanobu; Hanlin Zhu; Minlie Huang;
展开▼
作者单位

Institute for AI BNRist DCST Tsinghua University Beijing China;

Institute for AI BNRist IIIS Tsinghua University Beijing China;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Reward estimation for dialogue policy optimisation [J] . Pei-Hao Su, Milica Gašić, Steve Young Computer speech and language . 2018,第sepa期

机译：对话策略优化的奖励估算
2. Multi-goal multi-agent learning for task-oriented dialogue with bidirectional teacher-student learning [J] . He Wanwei, Sun Yang, Yang Min, Knowledge-Based Systems . 2021,第Feba15期

机译：具有双向师生学习的任务导向对话的多目标多代理学习
3. An empirical assessment of deep learning approaches to task-oriented dialog management [J] . Matehu Lukas, Griol David, Callejas Zoraida, Neurocomputing . 2021,第Juna7期

机译：对面向任务对话管理深度学习方法的实证评估
4. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog [C] . Ryuichi Takanobu, Hanlin Zhu, Minlie Huang International joint conference on natural language processing . 2019

机译：引导对话策略学习：多域面向任务的对话的奖励估计
5. Transfer Reinforcement Learning for Task-Oriented Dialogue Systems [D] . Mo, Kaixiang. 2018

机译：面向任务的对话系统的转移强化学习
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog [O] . Ryuichi Takanobu, Hanlin Zhu, Minlie Huang 2019

机译：引导对话策略学习：多域面向任务的对话的奖励估计

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

摘要

著录项

相似文献

相关主题

期刊订阅