首页>
外国专利>
LIFELONG LEARNING WITH A CHANGING ACTION SET
LIFELONG LEARNING WITH A CHANGING ACTION SET
展开▼
机译:终身学习改变动作集
展开▼
页面导航
摘要
著录项
相似文献
摘要
Systems and methods are described for a decision-making process that includes an increasing set of actions, compute a policy function for a Markov decision process (MDP) for the decision-making process, wherein the policy function is computed based on a state conditional function mapping states into an embedding space, an inverse dynamics function mapping state transitions into the embedding space, and an action selection function mapping the elements of the embedding space to actions, identify an additional set of actions in the increasing set of actions, update the inverse dynamics function based at least in part on the additional set of actions, update the policy function based on the updated inverse dynamics function and parameters learned during the computing the policy function, and select an action based on the updated policy function.
展开▼