首页> 外文会议>Applications of Artificial Neural Networks IV >Manipulator arm control by neural network with reward/punish learning scheme
【24h】

Manipulator arm control by neural network with reward/punish learning scheme

机译:具有奖励/惩罚学习方案的神经网络的机械臂控制

获取原文
获取原文并翻译 | 示例

摘要

Abstract: In this paper, a neural network with the reward/punish learning scheme is used to control manipulator arms. At each discrete point of the work space, one neuron for each joint is assigned to control the movement of the arm. The inputs to the neuron are the position error and the velocity of the joints. The net-input of the neuron, which is the linear combination of the input and its weight is passed through a Sigmoid function to generate the final output. The output of the neuron is the torque required to control the arm to its desired position. The reward/punish learning mechanism is implemented to adaptively modify the weights. The weights are punished if the previous move was in the wrong direction. Otherwise, the weights are rewarded. By doing this iteratively, the network learns the inverse dynamics of the manipulator without knowing its model or forward dynamics. The neurons can finally output appropriate torques to maintain the manipulator arm at a proper location. Due to the simple learning algorithm, the network learns the inverse dynamics quickly. Therefore, it can be used in real-time applications. A two-link planar manipulator is demonstrated in this paper. The position error and the torque generated for each joint are shown graphically. These figures also show that, after the inverse dynamics of the manipulator is learned, the network moves the arm to its desired position quickly after step disturbances of $POM 2.5 degrees were injected into the system. Although only a 2-DOF is illustrated, the concept can be extended to a 6-DOF system.!22
机译:摘要:本文采用带有奖励/惩罚学习方案的神经网络来控制机械臂。在工作空间的每个离散点,为每个关节分配一个神经元来控制手臂的运动。神经元的输入是位置误差和关节的速度。神经元的净输入是输入及其权重的线性组合,通过Sigmoid函数传递以生成最终输出。神经元的输出是将手臂控制到所需位置所需的扭矩。实施奖励/惩罚学习机制以自适应地修改权重。如果先前的动作方向错误,则将对重量进行惩罚。否则,权重将得到奖励。通过迭代地执行此操作,网络将学习操纵器的逆动力学,而无需知道其模型或前向动力学。神经元最终可以输出适当的扭矩,以将操纵臂保持在适当的位置。由于简单的学习算法,网络可以快速学习逆动力学。因此,它可以用于实时应用程序。本文介绍了一种两连杆平面操纵器。图形显示每个关节的位置误差和产生的扭矩。这些图还显示,在了解了机械手的逆向动力学之后,在将2.5度POPO 2.5度的阶跃扰动注入系统后,网络将手臂快速移至所需位置。尽管仅示出了2-DOF,但该概念可以扩展到6-DOF系统。22

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号