Dynamic Programming or Direct Comparison?
Speaker:Prof. Xi-Ren Cao
Chair Professor, Dept. of Electrical and Computer Engineering
Hong Kong University of Science and Technology
Date & Time:05 Oct 2009 (Monday) 16:00 - 17:30
Venue:L105

Abstract

The standard approach to control and optimization of stochastic systems is based on dynamic programming. This approach works backward in time and it treats the infinite-horizon problems as the limiting case with the backward time going to infinity. Optimality equations are first derived and then it is proved that the solutions to the optimality equations indeed lead to optimal policies. When the value functions are not differentiable, the concept of viscosity solutions is introduced.

A sensitivity-based approach has been developed recently to stochastic learning and optimization. The approach was first developed for discrete event dynamic systems and is being extended to continuous-time and continuous state systems. The basic idea is: fundamentally, one can only compare the performance of two policies at a time; and therefore, when developing optimization theories and methodologies, one has to first study the difference of the performance of any two polices. It turns out that many results in optimization can be obtained by a direct comparison of the performance of any two policies based on this performance difference formula. We found that this “direct comparison” method is essential for optimization.

This approach has some advantages over the dynamic programming approach: It is intuitive clear because it is based on a direct comparison of any two policies. Thus, it is easy to verify that the solution to the optimality equation is indeed optimal; viscosity solution is not needed. This approach applies in the same way to different performance criteria, including finite and infinite-horizon problems. Furthermore, the approach brings some new insights that leads to new methods and results in control and optimization, including the event-based optimization and gradient-based learning.