Each day, you must make a binary decision: to buy or sell a stock, whether to bring an umbrella to work, whether to take surface streets or the freeway on your commute, etc. Each time you make the wrong decision, you are penalized. Fortunately you have help in the form of $$n$$ “expert” advisers each of whom suggest a choice for you. However, some experts are more reliable than others, and you do not initially know which is the most reliable. How can you find the most reliable expert while incurring the fewest penalties? How few penalties can you incur compared to the best expert?

To make matters simple, first assume that the best expert never gives the wrong advice. How quickly can we find that expert?

Here is a simple procedure that allows us to find the best expert. Choose an expert $$a$$ arbitrarily on the first day, and follow their advice. Continue following $$a$$’s advice each day until they make a mistake. After your chosen expert errs, choose a different expert and follow their advice until they err. Continue this process indefinitely. Since we’ve assumed that the best expert never errs, you will eventually follow their advice.

Observe that in the above procedure, each day we either get the correct advice, or we eliminate one of the errant experts. Thus, the total number of errors we make before finding the best expert is $$n – 1$$. Is it possible to find the best expert with fewer penalties?

Consider the following “majority procedure.” Each day, look at all $$m$$ experts’ advice and take the majority opinion (choosing arbitrarily if there is a tie). At the end of the day, fire all incorrect experts, and continue the next day with the remaining experts. To analyze this procedure, observe that each day, either the majority answer is correct (and you are not penalized) or you fire at least half of the experts. Thus, the number of penalties you incur is at most $$\log n$$ before you find the best expert. That is pretty good!

Question 1 Is it possible to find the best expert while incurring fewer than $$\log n$$ penalties?

To this point, things have been relatively easy because we assumed that the best expert never errs. Thus, as soon as an expert makes an incorrect prediction, we know they are not the best expert. In practice, however, it is unlikely that the best expert is infallable. What if we are only guaranteed that the best expert errs at most $$m$$ times?

It is not hard to see that the majority procedure can be generalized to account for an expert that errs at most $$m$$ times. The idea is to follow the majority opinion, but to allow an expert to err $$m$$ times before firing them. We call the resulting procedure the “$$m + 1$$ strikes majority procedure.” Each day, follow the majority opinion (again breaking ties arbitrarily). Every time an expert is wrong, they receive a strike. If an expert ever exceeds $$m$$ strikes, they are immediately fired.

To analyze the $$m + 1$$ strikes procedure, observe that if the majority opinion is ever wrong, at least half of the remaining experts receive strikes. Since the total number of strikes of all remaining experts cannot exceed $$m n$$, after $$2 m + 1$$ majority errors, we know that at least half of the experts have been fired. Thus, we find the best expert with penalty at most $$(2 m + 1) \log n$$. (I suspect that a more clever analysis would give something like $$(m + 1) \log n$$, thus matching the majority procedure for $$m = 0$$.) Can we do better than the $$m + 1$$ strikes majority procedure?

One objection that one might have with the $$m + 1$$ strikes procedure is that an expert must err $$m + 1$$ times before any action is taken against them. Another reasonable approach would be to have a “confidence” parameter for each expert. Each time the expert errs, their confidence is decreased. When taking a majority decision, each expert’s prediction is then weighted by our confidence in that expert. If we adopt this strategy, we must specify how the confidence is decreased with each error.

Littlestone and Warmuth proposed the following “weighted majority algorithm” to solve this problem. Initially, each expert $$i$$ is assigned a weight $$w_i = 1$$, where a larger weight indicates a greater confidence in expert $$i$$. Each time expert $$i$$ errs, $$w_i$$ is multiplied by $$1/2$$. Each day, we take as our prediction the weighted majority–each expert $$i$$’s prediction carries $$w_i$$ votes, and we choose the option receiving the greatest sum of votes.

By way of analysis, consider what happens each time the majority answer is wrong. Let $$W_t$$ denote the total weight of all experts after $$t$$ penalties have been incurred. Observe that initially we have $$W_0 = n$$. When we incur the $$(t + 1)$$-st penalty, at least half of the $$W_t$$ total votes were cast in favor of the wrong answer. Since the total errant weight is cut in half, we find that $$W_{t + 1} \leq (3/4) W_t$$. Thus, $$W_t \leq (3/4)^t n$$. On the other hand, we know that some expert $$i$$ errs at most $$m$$ times. Therefore, $$2^{-m} \leq w_i \leq W_t$$ for all $$t$$. Combining this with the previous expression gives
$2^{-m} \leq W_t \leq (3/4)^t n.$
Taking logs of both sides of this expression, we find that
$t \leq \frac{(m + \log n)}{\log(4/3)}.$
It seems incredible to me that this simple and elegant algorithm gives such a powerful result. Even if we have initially no idea as to which expert is the best, we can still perform nearly as well as the best expert as long as $$m >> \log n$$. (Note that $$1 / \log(4/3) \approx 2.4$$.) Further, we do not even need to know what $$m$$ is in advance to apply this procedure!

Question 2 Is it possible to improve upon the bound of the weighted majority algorithm?