Bandit Testing: When Should You Use Multi-Armed Bandit Testing

Split testing. When you hear the term split testing, most people assume it is referring to A/B testing. And typically you would be correct. However, technically A/B testing is a type of split testing.

Bandit testing is another type of split testing. If you’ve been doing research into conversion rate optimization, split testing, or A/B testing, then you have probably come across bandit testing.

More technically known as multi-armed bandit testing, it is an alternative approach to split testing compared to A/B split testing. Just because bandit testing offers a different approach does not mean you should take it. This article will answer the following questions:

  1. What is multi-armed bandit testing?
  2. How is multi-armed bandit testing different from A/B testing?
  3. What are the benefits of multi-armed bandit testing?
  4. What are the disadvantages to multi-armed bandit testing?

Let’s jump right into things, shall we!

1. What is multi-armed bandit testing?

row of slot machines

Not to be confused with our raccoon friend, multi-armed bandit testing comes from a gambling scenario. A gambler is in front of a row of slot machines (aka one armed bandits, hence the name). The gambler knows that each machine has its own algorithm of rewards. Some machines payout better than others. The trick is to figure out which machines offer the better rate of return and maximize it. The gambler will have two phases: explorative and exploitative.

During the explorative phase the gambler will put coins into all of the machines equally to test which gives the higher returns (aka conversions). Then during the exploitative phase, the gambler starts to put more coins (traffic) in the machines that have the higher rewards.

How bandit testing works with your website

Instead of three slot machines, we have three versions of your website (A, B, and C). Our explorative and exploitative phases run indefinitely (or until we end the test).

For example, we can set the explorative level at 10% and the exploitative level at 90%. (This ratio can be changed as desired). Let’s use 10,000 visitors for an easy number.

This means 10% of traffic (1,000 visitors) will “explore” the variations. Traffic is evenly distributed across all three options. The other 90% of traffic (9,000 visitors) will be proportionally directed to the versions with the higher conversion rates. The 90% exploits the results from the 10%.

2. How is multi-armed bandit testing different from A/B testing?

If you remember from the A/B testing crash course article, traffic during an A/B test is split evenly between the different variations. In our A/B/C example above, our split test would direct 33% of traffic to each version.

With A/B testing, versions that aren’t performing as well continue to receive the same amount of traffic as the versions with a higher conversion rate. In a bandit test, the versions with the higher conversion rates receive a higher proportion of traffic.

3. What are the benefits of multi-armed bandit testing?

When time is not on your side: sales or other seasonality.

This is hands down the benefit to running a multi-armed bandit test over an A/B test. An A/B test could run 2 weeks or even 6+ weeks long to properly determine a winner. (A combination of statistical significance and running the test over multiple business cycles.) Then you implement the winner on your site for the foreseeable future.

A/B tests don’t work well if you are having a sale that lasts only 3 weeks (kind of irrelevant actually). Your goal is optimizing conversions during the sale. A multi-armed bandit test does this. You don’t care about the foreseeable future because there isn’t a future after the limited window of the sale.

If you are very risk averse when it comes to testing, this could be the test for you.

Honestly, I debated whether to list this as a benefit or not. Here’s why:

If you are very risk averse, you probably aren’t doing any kind of serious testing. Even the best hypothesis must get tested eventually to prove its worth. It could fail, or it could sky-rocket your conversions. You just don’t know until you test it.

However, with bandit testing you get to be greedy. Traffic gets pushed to the higher performing variations. This means you are maximizing your conversions. But this may not bode well for confidently determining the winning version. (I’ll explain this more in the disadvantages section below.)

If maximizing conversions is the name of the game, then…

Why doesn’t everyone use bandit testing instead of A/B testing?

This is a great question. Bandit testing has its select benefits, but it has many disadvantages.

4. What are the disadvantages to multi-armed bandit testing?

Say it with me, “Set it and forget!”

If you are a lazy tester, then this is a benefit. For the rest of us, it’s a severe disadvantage. As a serious tester, you want to determine a winner as efficiently (and quickly) as possible. Then once you have your winner you can run your next test. Think of it like this:

You test version B against your current version (A). Version B wins. Then you test version C against B (the new original). Version B wins again. Next, you test version D. Version D beats version B. Version D becomes the new standard.

This is an example of the ongoing testing you should be conducting. The more versions you test at once, the longer the test will take to determine a winner. Which is why it is better to limit the number of versions at one time, unless you have a ton of traffic. Also, you may not have thought B would win, or other insights came out of the test. As a result of these insights, version C is born.

By continuously testing, you can continuously make improvements to your website and conversion funnel.This is the only way to continue to maximize conversions, revenue, and profit in the long term. If you “set it and forget” you aren’t continuously improving. Instead, you are stuck in a rut and money is left on the table.

Bandit tests are better used as a strategy in a specific setting rather than as a test in the traditional A/B testing sense. Here’s why:

A bandit test can run indefinitely. It will run until the code is deleted, or until the end of the world as we know it. Whichever happens first.

This is great if you want to run a test forever. It is not so great if you want to choose and implement a winning version. Which brings me to my next several points.

The short term conversion benefit can hurt your conversion rate in the long term.

Used in the right situation, a bandit test can be a great tool in your testing tool belt. However, the use of bandit testing should generally be the exception rather than the rule.

In the short term, your conversions are maximized. Most traffic is pushed to the best performing version. In the long term, some percentage of traffic is always directed towards all versions. This includes the versions not performing as well.

The exploration phase never stops during a bandit test. It’s part of the code. The percentage of traffic that makes up the explorative segment can be changed, but it never fully disappears.

In layman’s terms, the losers are always getting some traffic, unlike with an A/B test where the winning version is eventually implemented. As a result with bandit testing you lose out on a certain amount of conversions in the long term.

Maybe you are thinking at this point, “why can’t I have the best of both A/B testing and bandit testing?” In theory, you run the bandit test to maximize conversions until a winner is determined. Then delete the bandit testing code and implement the winner. In theory this works, in practicality there are several problems.

Multi-armed bandit testing can take 5+ times longer to determine a winner than A/B testing.

Visual Website Optimizer (VWO) ran a scenario with 3 website versions. They had the following conversion rates: A=10%, B=15%, and C=20%. Additionally they ran the simulation through 4 testing algorithms as denoted below.

  • Simple randomization of A/B testing: RAND
  • Multi-armed bandit (10% exploration, 90% exploitation): MAB-10
  • Multi-armed bandit (50% exploration, 50% exploitation): MAB-50
  • Multi-armed bandit (77% exploration, 23% exploitation): MAB-77

They waited for statistical significance to be found at least 10 times. These were the results:

iteration # when statistical significance was achieved

As you can see, the highest conversion rate happened with the bandit test that had the highest exploitation. However, it took almost 6 times as much traffic to declare significance. This goes back to what I said above about short term gains.

During the test, you optimize your conversions. But, none of the tests beat out the 20% conversion rate of C. If you want to maximize your conversion rate long term, you want to declare a winner as efficiently as possible. Then implement it. In this case, version C.

VWO ran a different simulation to test statistical significance when the conversion rates were minimally different (10% vs 11%). The simulations were run 25 times with 10,000 iterations.Their findings did not bode well for bandit tests.

number of times no statistical significance was seen

The bandit testing simulations needed more traffic to declare statistical significance. And they were more likely not to reach statistical significance at all. Determining if a test won’t achieve statistical significance –or if there isn’t a clear conversion rate winner– is crucial. The faster that is determined, the sooner you can run a new test. Which is why A/B testing is often the better testing choice over a multi-armed bandit test.

Bandit testing may inaccurately allocate traffic to variations that are actually less effective based on business cycles.

All businesses have cycles. Conversion rates often change based on daily, weekly, and even monthly business cycles. Let’s say most of your business during a week happens Wednesday-Friday. You start a bandit test on Monday.

Based on Monday and Tuesday’s conversion rates more traffic is allocated towards version A instead of B or C. However, Monday and Tuesday are not your ”real” business days. Meaning, the results from Monday and Tuesday indicating that A is the best converter, could be a false positive of sorts. Eventually the bandit test will correct itself, but the damage to lost conversions will have been done.

Bandit tests assume that time variables don’t exist. This is generally not the case in the business world. If your business has any kind of cyclical nature to it (most businesses do), it is better to set up an A/B test that runs over the entire cycle. You will receive a better, more holistic view of the different conversion rates.

Users can get confused with different versions and variations.

During an A/B test, users are sorted to see the different versions or variations via a cookie. They will always see the same version until a new (or winning) version is implemented. This is not the case with bandit testing.

Traffic and users are directed to the version with the highest conversion rate. A user could see version A during an exploration phase. Then during the exploitation phase that same user could see version B.

If there is a shift in the conversion rate, or a change in the exploration/exploitation levels, the user could be directed back to version A. Or even to version C, if there is one. Basically, this makes for a confusing user experience.


two raccoons

Multi-armed bandit tests are a great strategy to use when you want to maximize conversions in a short time frame. Main example of this is for sales.

Generally, bandit tests should not be used as an ongoing strategy for your website. A/B tests are the best testing strategy to efficiently make site choices. (I.e. What version of copy goes on your sign-up button. Or what color should the button be- green or red.)

Regardless of whether you are running a multi-armed bandit test or an A/B test, use good testing practices. This means knowing the conditions for stopping the test: length of time and statistical significance level.

Some testing software automatically sets a test length for 7 days and 95% statistical significance. This does not mean those are the right conditions for your test.

Now that you’re armed with the knowledge of when to use multi-armed bandit tests, feel free to implement it into your site. Just make sure you’re using it for the right reasons.

If you’d like some help setting up a custom bandit solution, check out our consulting options.

P.S. For those interested in the more technical aspects of split testing — multi-armed bandit testing is a type of greedy algorithm. It can also be referred to as an epsilon-greedy strategy. A/B split testing is considered an epsilon-first strategy… Just in case you were curious. :)

P.P.S. There will be another article going into greater depth on all of these technicalities at some point. Join our mailing list to be the first to know about it.

Feature image credit

Want to Know More?

If you have any questions or comments, feel free to contact me. I love talking about this stuff, and I make sure to answer every email.

Hey There!

Say Hi!