首页> 外国专利> NEURAL NETWORK MODEL FOR REACHING A GOAL STATE

NEURAL NETWORK MODEL FOR REACHING A GOAL STATE

机译:达到目标状态的神经网络模型

摘要

An object, such as a robot, is located at an initialstate in a finite state space area and moves under thecontrol of the unsupervised neural network model of theinvention. The network instructs the object to move inone of several directions from the initial state. Uponreaching another state, the model again instructs theobject to move in one of several directions. Theseinstructions continue until either: a) the object hascompleted a cycle by ending up back at a state it hasbeen to previously during this cycle, or b) the objecthas completed a cycle by reaching the goal state. If theobject ends up back at a state it has been to previouslyduring this cycle, the neural network model ends thecycle and immediately begins a new cycle from the presentlocation. When the object reaches the goal state, theneural network model learns that this path is productivetowards reaching the goal state, and is given delayedreinforcement in the form of a "reward". Upon reaching astate, the neural network model calculates a level ofsatisfaction with its progress towards reaching the goalstate. If the level of satisfaction is low, the neuralnetwork model is more likely to override what has beenlearned thus far and deviate from a path known to lead tothe goal state to experiment with new and possibly betterpaths. If the level of satisfaction is high, the neuralnetwork model is much less likely to experiment with newpaths. The object is guaranteed to eventually find thebest path to the goal state from any starting location,assuming that the level of satisfaction does not exceed athreshold point where learning ceases.
机译:诸如机器人之类的对象位于初始位置状态在有限状态空间区域内并在下面移动的无监督神经网络模型的控制发明。网络指示对象移入初始状态的几个方向之一。在到达另一状态,模型再次指示对象向几个方向之一移动。这些指令继续进行,直到:a)对象具有通过返回到它具有的状态来完成一个循环曾在此周期中去过,或者b)对象通过达到目标状态完成了一个周期。如果对象最终回到以前的状态在此周期内,神经网络模型结束了循环并立即从现在开始新的循环位置。当对象达到目标状态时,神经网络模型得知该路径具有生产性达到目标状态,并被延迟以“奖励”的形式进行强化。到达状态,神经网络模型计算出对实现目标的进度感到满意州。如果满意度低,神经网络模型更有可能覆盖已存在的问题到目前为止已经学到了,并且偏离了已知的通往尝试新的和可能更好的目标状态路径。如果满意度高,神经网络模型不太可能尝试新的路径。该对象保证最终找到从任何起始位置到达目标状态的最佳路径,假设满意度不超过停止学习的起点。

著录项

  • 公开/公告号CA2039366C

    专利类型

  • 公开/公告日1995-04-04

    原文格式PDF

  • 申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;

    申请/专利号CA19912039366

  • 发明设计人 LYNNE KENTON JEROME;

    申请日1991-03-28

  • 分类号G06F9/00;G06F15/20;

  • 国家 CA

  • 入库时间 2022-08-22 04:16:26

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号