机译:Deterministic policies based on maximum regrets in MDPs with imprecise rewards
Leonard de Vinci Pole Univ;
Univ Sorbonne Paris Nord;
Markov Decision Process; minimax regret; unknown rewards; branch-and-bound; deterministic policy; stochastic policy;