The robot phone disassembly task is difficult in many ways: It has requirements on high precision, high speed, and should be general to all types of cell phones. Previous works on robot learning from demonstration are hardly applicable due to the complexity of teaching, huge amounts of data and difficulty in generalization. To tackle these problems, we try to learn from videos and extract useful information for the robot. To reduce the amounts of data we need to process, we generate a mask for the video and observe only the region of interest. Inspired by the idea that spatio-temporal interest point (STIP) detector may give meaningful points such as the contact point between the tool and the part, we design a new method of detecting STIPs based on optical flow. We also design a new descriptor by modifying the histogram of optical flow. The STIP detector and descriptor together can make sure that the features are invariant to scale, rotation and noises. Using the modified histogram of optical flow descriptor, we show that even without considering raw pixels of the original video, we can achieve pretty good classification results.;
展开▼