Running multiple experiments at once can sound productive. But most experimentation teams need to focus on quality over quantity.
- The noise level
- The data integrity
- The metrics involved
While it’s tempting to follow the lead of sites like Booking.com, which are infamous for running many experiments per year, if you are new to running experiments your best bet is to start small and ramp up.
Tip #1: Consider the “noise level”
“I find a question that asks back is, are you happy to have that noise associated with the volume of the swim lane of tests happening? We've been talking about this internally, visualizing it as well. You could have somebody coming in on a homepage, hitting a homepage test, hitting a product list page test, product page test, cart test, checkout test. And if you're constantly looking at one metric, this is where I think it starts elevating that conversation. If you just talk about order confirmation and you're inferring the change on the homepage is a result of that, it can get a bit noisy,” said Paul.
The best advice is if you've got the bandwidth, isolate it. It’s always best to isolate your tests to guarantee clear results. While there's a tradeoff in the volume that you can do, it ultimately depends on the metrics that you want to report on and the degree of certainty that you want to achieve.
Tip #2: Consider data integrity
“Another side of that too is if you have five tests running concurrently, it's not just necessarily the end result of conversion—whatever metric you're kind of optimizing towards—that could be noisy. I think if you have five tests running at once on like what that's on-page category product checkout, then the tests that we tend to run—a lot of these test to learn tests, you need the data to be as clean as you can, so that the behavior observed in this controlled area is not confounded with other behavior. And the more noise you add, not only do the results get a little bit noisy, but the behavior gets shifted,” Shiva said.
Trying to run as many tests as Booking.com is like telling your child to do everything Michael Jordan does so that they can become the next Michael Jordan.
In reality, companies have to follow a stepwise process to get better and better with their experimentation program over time. You can’t expect to reproduce a robust experimentation program when you don’t have dedicated data scientists on staff to help delineate the noise.
If you're a single person on an experimentation team, you might be shooting yourself in the foot by running a bunch of tests at once due to the difficulty of delineating the data.
Tip #3: Consider your metrics
“I think it helps in going: where's the closest metric which still adds value? This is where the companies and businesses and stakeholders have to be okay with it not always being, say, revenue per visitor. If you're trying to make a change on a homepage, it might be a more behavioral metric which, provided this still statistical significance and you're moving in the right direction, is going to be helping you. But the alternative is if you go well, every experiment has to be completely clean, yeah. Then we're going to be queuing up a lot of experiments. We're not going to run as many. And you're going to be effectively optimizing in a slower way,” Paul said.
Your decision to run concurrent experiments or not comes down to making concessions. There isn't a simple win-win; you have to understand that there are tradeoffs. Running 100 tests instead of four tests requires giving something up in order to achieve high volume: the clarity of the end number.