Friday, December 2, 2011

Multiarmed Bandits

I tweeted a little while ago about a post from the Untyped people on applying a multiarmed bandit algorithm to website optimization (in contrast to A/B testing). It turns out that many have considered the multiarmed bandit problem for optimizing market maker profitability, Bandit Market Maker by Penna and Reid [PDF]. The basic idea is to dynamically optimize (i.e., reinforcement/online learning) the use of k-levers each operating under initially unknown and eventually partly known distributions. Presumably, the more one uses the levers, the more one learns about the reward, but there is an exploration-exploitation dilemma -- since we are operating online one has to choose whether to take the empirically revealed best choice thus far or to continue to explore for more rewarding options. The exploration-exploitation dilemma also matters to genetic algorithms and programming, but the multiarmed bandit problem explicitly optimizes for it. The exciting thing is that after decades of research, they have found optimal algorithms for the multiarmed bandit problem. The Untyped post cites one of them, Finite-time Analysis of the Multiarmed Bandit Problem by Auer et al [PDF].

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.