ML: Bayesian Bandit

the problem

Suppose you are at a casino and have a choice between N slot machines. Each of the N slot machines (bandits) has an unknown probability of letting you win. i.e. Bandit 1 may have P(win) = 0.9. Bandit 2 may have P(win) = 0.3. We wish to maximize our winnings by playing the machine which has the highest probability of winning. The problem is determining which machine this is without playing each machine million times so we can maximize the profit!

the general idea

The idea: let’s not pull each arm 1000 times to get an accurate estimate of its probability of winning. Instead, let’s use the data we’ve collected so far to determine which arm to pull. If an arm doesn’t win that often, but we haven’t sampled it too much, then its probability of winning is low, but our confidence in that estimate is also low – so let’s give it a small chance in the future. However, if we’ve pulled an arm many times and it wins often, then its probability of winning is high, and our confidence in that estimate is also high – let’s give that arm a higher chance of being pulled.

the result

~~~

Tags: machine learning

Leave a Reply Cancel reply