Charles-Albert LEHALLE (CFM)
Reinforcement Learning and some applications in High Frequency Finance
Reinforcement Learning (RL) has been recently used for application on financial markets: optimal trading and deep hedging are typical examples. RL is based on a fixed point method that is generic in machine learning, and covers the stochastic gradient descent. It has been identified by Benveniste-Métivier-Priouret and Kushner in the 80ties that to obtain convergence, the learning rate of such dynamics have to cope with specific assumptions, generally stated as "Robbins-Monro conditions''. As noted by Sutton in the late 90ties: in the case of RL, and especially when the dynamics arise from a Markov chain, these conditions have to be carefully expressed in dimension larger than one. Nevertheless Robbins-Monro conditions (converging to zero fast enough to "kill the variance" and slow enough to need the fixed point to be attained to allow convergence of the whole dynamics) are not very restrictive and thus allow smart choices to improve different properties of the convergence. We first recall the most recent convergence results and restate them for RL. Then we focus on online RL, for which the exploitation-exploration trade-off is very important. A good example is optimal placement of a limit order if one wants to adapt on the fly the strategy to current dynamics of the orderbook. In such a case we propose an algorithm inspired from line-search and well-suited for RL. We prove its efficiency in the sense that it converges very fast at the beginning of the learning, and then copes with the standard asymptotic nice properties of Robbins-Monro conditions. We provide applications to optimal limit orders and optimal liquidation speed for financial markets.