首页>
外国专利>
Machine learning device, robot system and machine learning method for learning a movement of a robot that is involved in a task jointly performed by a human and a robot
Machine learning device, robot system and machine learning method for learning a movement of a robot that is involved in a task jointly performed by a human and a robot
A robot system comprising: a machine learning device for learning a movement of a robot that is involved in a task jointly carried out by a human (1) and a robot (3), comprising: - a state monitoring unit (21) that monitors a state variable that has a Indicates the state of the robot (3) when the human (1) and the robot (3) work together and perform a task, - a reward calculation unit (22), based on control data and the state variable for controlling the robot (3) and an action of the human (1) calculates a reward, and - a value function updating unit (23) which updates an action value function for controlling a movement of the robot (3) based on the reward and the state variable, - the robot (3), which together with the human (1) performs a task; - a robot control unit (30) that controls a movement of the robot (3); and - a task intention recognition unit (51) which receives an output of a camera (44), a force sensor (45, 45a, 45b), a touch sensor (41), a microphone (42) and an input device (43) and an intention relating to a Recognizes the task, the machine learning device (2) learning a movement of the robot (3) by analyzing a distribution of feature points or workpieces (W) after the human (1) and the robot (3) have worked together and performed the task, the state variable input into the state monitoring unit (21) of the machine learning device (2) comprises an output of the task intention recognition unit (51), and wherein the task intention recognition unit (51) converts a positive reward based on an action of the human being (1) into a positive reward Converts the state variable and outputs the state variable to the state monitoring unit (21), - one based on an action of the person (1) converting the negative reward into a state variable established for the negative reward and outputs the state variable to the state monitoring unit (21), and wherein the reward calculation unit (22) calculates the reward by adding a second reward based on the action of the person (1) to one based on the control data and the first reward is calculated based on the state variable.
展开▼