The paper deals with a discrete-time consumption investment problem with an infinite horizon. This problem is formulatedas a Markov decision process with an expected total discounted utility as an objective function. This paper aims to presentsa procedure to approximate the solution via machine learning, specifically, a Q-learning technique. The numerical resultsof the problem are provided.
展开▼