## And avoiding the common mistakes that derail most test efforts.

This article is the 3rd one in my series of articles about A/B Testing.

In the first article, I presented the intuition behind A/B testing and the importance of establishing the magnitude of the effect you hope to observe and corresponding sample size.

In the second article, I talked about how product managers can design A/B Tests in a manner that speeds the tests up.

In this 3rd article, I will talk about another aspect of A/B Test: **What factors should you consider when determining the metric for your A/B Test?**

## Case Study: VRBO Search Landing Pages

When we design an A/B test, we select a primary metric that we hope to improve (and several secondary metrics) and measure it both the variant and control group. *If we don’t choose this metric carefully, we are wasting our time.*

We’ll use an example that I am familiar with: Search Landing Pages on VRBO. VRBO is a two-sided marketplace where homeowners can list their homes for rent, and potential travelers can find the right accommodation for the next trip. The purpose of the Search Landing Page is to receive traffic from google and convert that traffic into people who perform higher intent inquiries.

Let’s look at some screenshots, starting with the most common way travelers start their planning process (searching on Google)

*Step 1**: Thinking about traveling to the Bahamas? Let’s search.*

*Step 2**: Aha! It looks Like VRBO has excellent options. Let’s look there.*

Step 3*: Let’s find out what options I have on the Bahamas.*

We built this page for:

**High booking intent users**. Users who may already have booked their flights or at least have a sense for when they want to travel. We hypothesized that for these users, the page’s jobs-to-be-done was to find homes that were available for their travel dates.**Low booking intent users**. Users who were very early in the planning stage and may not have any sense of when they might be traveling. We hypothesized that for these users, the page’s jobs-to-be-done was to help them explore the variety of homes available and to influence the user to visit the Bahamas.**Google Bot**. We wanted Google to index the page for most relevant user queries.

The whole user journey (from landing on VRBO from Google to booking looks like this.)

There are two specific things to note:

- Between the initial and the final step, there are multiple steps that a user must take, and at each level, some users will drop out.
- Since travel is considered purchase (vs. an impulse purchase), the time between the initial and the final step may be in the order of weeks.

Now let’s look at the mathematics of this conversion funnel, make some assumptions about the conversion from one step to another and estimate the overall conversion rate. (** Disclaimer: These numbers are for illustrative purpose only**)

Finally, you have to get a rough order of magnitude of the traffic. Let’s make some assumptions here (*Disclaimer: These numbers are for illustrative purpose only.)*

- Total unique visitors per month: 10 million
- Unique new visitors arriving on the search landing page: 30% ( 3 Million Users)

Now imagine you are the product manager for the Search Landing Page. Against these base rates, let’s look at a hypothetical A/B test and look at two possible metrics for your experiment.

Test Hypothesis: By adding a background “hero” image on the search landing page that is indicative of the destination, users will feel the comfort that they are looking at the right destination, leading to higher searches and overall higher conversion by 2%

You have two choices of metrics, overall conversion, and % of users doing dated searches.

## Experiment Design when we choose Overall Conversion as our metric

It’s very tempting to use the overall conversion rate as the metric for the product manager. After all, you can tell your management that you have increased your revenue by $$$.

If you decide to choose this as your metric, let’s look at the test parameters: test sample size and overall test duration. Let’s plug our base rate of 0.225% and minimum detectable effect (MDE) of 2% into Evan Miller’s Sample Size calculator.

Overall, you will need 34,909,558 samples across your variant and control groups.

**With 3 million unique users per month, this will require 11–12 months for your test to complete, if you do this test correctly. **A lot of people will make the mistake of seeing some positive results earlier, get impatient, and stop the experiment prematurely. If you do that, you are most likely looking at a false positive.

## Experiment Design when we chose % of users doing a dated-search as the primary metric

If you decide to chose this as your metric, let’s look at the test parameters: test sample size and overall test duration. Let’s plug our base rate of 30% and minimum detectable effect (MDE) of 2% into Evan Miller’s Sample Size calculator.

Overall, you will need 183,450 samples across your variant and control groups. **With 3 million unique users per month, this will require a few days for your test to complete. [**You may want to consider running the test for a whole week to eliminate any chance of a day-of-the-week bias.]

With this approach, you can run 10’s of experiments in the same amount of time.

## Lessons Learned

If the above situation sounds hypothetical, let me assure you that plenty of product managers (including me) have taken the approach of using the overall conversion rate as the primary metric. Here are some of the lessons I learned that I’d like to share more broadly.

- When you design your test, pay sufficient attention to the metric you pick. If the feature you are testing is higher up in the funnel and your overall conversion rate is less than 1%, your tests results will take months to complete. (Unless you are Facebook, Google, Amazon, Indeed, or a top internet site.)
- When your test takes months to complete, the likelihood of a bug creeping in due to an unintended and unrelated change and corrupting your test results will be extremely high. You may have to restart your test.
- The further your feature is from the overall conversion, the lower the likelihood of your change
**causally impacting**the metric. - The best option is to use a metric that is directly impacted by your change, such as an on-page metric, to measure micro-conversions.
- If you chose an on-page metric like click-through-rate, pay attention to unintended consequences by looking at a counterbalancing metric. If we select dated-search as our metric, we will also look at the bounce rate from that page as well as the following page. This technique ensures that the product change is not sending unqualified traffic downstream. (More on this topic in a future article.

*If you found this article useful, let me know. If you have any questions or doubts about A/B Testing, drop me a note in the comments and I’ll consider it as a topic for a future post.*

This is the 3rd article in my set of articles on A/B Testing. The other articles in the series are:

- The intuition behind A/B Testing — A Primer for New Product Managers
- How to split the traffic in an A/B Test

I want to give credit to Evan Miller for his excellent sample size calculator and his thought leadership on the topic of A/B testing.

**About Me**: Aditya Rustgi is a product management leader with over 15 years of product and technology leadership experience in Multi-sided Marketplaces and B2B Saas business models in eCommerce and Travel Industry. Most recently, he was a director of product management at VRBO, an Expedia Company.

