q(Ôg) than the distribution p(ôg) of goal state trajectories ôg and sampling the goal state trajectories ôg with the prioritised sampling distribution q(ôg) the AI system/agent is trained to achieve unseen goals by learning from diverse achieved goal states uniformly."/>
MAXIMUM ENTROPY REGULARISED MULTI-GOAL REINFORCEMENT LEARNING
首页>
外国专利>
MAXIMUM ENTROPY REGULARISED MULTI-GOAL REINFORCEMENT LEARNING
MAXIMUM ENTROPY REGULARISED MULTI-GOAL REINFORCEMENT LEARNING
展开▼
机译:最大熵调节的多目标强化学习
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention is related to a computer-implemented method of training artificial intelligence (AI) systems or rather agents (Maximum Entropy Regularised multi-goal Reinforcement Learning), in particular, an AI system/agent for controlling a technical system. By constructing a prioritised sampling distribution q(ôg) with a higher entropy q(Ôg) than the distribution p(ôg) of goal state trajectories ôg and sampling the goal state trajectories ôg with the prioritised sampling distribution q(ôg) the AI system/agent is trained to achieve unseen goals by learning from diverse achieved goal states uniformly.
展开▼
机译:本发明涉及训练人工智能(AI)系统或代理(最大熵正则化多目标强化学习),特别是用于控制技术系统的AI系统/代理的计算机实施方法。通过构造具有较高熵的优先采样分布q(ô g Sup>) q Sub>(Ô g Sup>)比分布p(ô g Sup>)目标状态轨迹ô g Sup>并使用优先采样分布q(ô g Sup>)AI系统/代理对目标状态轨迹ô g Sup>通过统一学习各种已实现目标的状态来训练未实现的目标。
展开▼