The forward modeling approach of M.I. Jordan and J.E. Rumelhart (1990) has been shown to be applicable when supervised learning methods are to be used for solving reinforcement learning tasks. Because such tasks are natural candidates for the application of reinforcement learning methods, there is a need to evaluate the relative merits of these two learning methods on reinforcement learning tasks. The author presents one such comparison on a task involving learning to control an unstable, nonminimum phase, dynamic system. The comparison shows that the reinforcement learning method used performs better than the supervised learning method. An examination of the learning behavior of the two methods indicates that the differences in performance can be attributed to the underlying mechanics of the two learning methods, which provides grounds for believing that similar performance differences can be expected on other reinforcement learning tasks as well.
展开▼