According to some embodiments of the present invention there is provided a method for determining a control action in a control system using a Markov decision process. The method comprises an action of receiving measured transition probability values of a Markov decision process (MDP) and receiving simulated transition probability values generated by performing a control system simulation. New transition probability values are computed by calculating a measured data count of some of the sensor measurements and a simulated data count of some of the simulated transition data. New transition probability values are computed from a weighted average between the measured transition probability values and the simulated transition probability values using the measured data count and the simulated data count. A new control action is determined based on the one or more new transition probability value.
展开▼