Author: Martin P.
Title: Content Marketer
EP 27: Strategic Testing Roadmap
Experimentation starts with research.
Hey there.
Martin P. is on the line.
A warm welcome to all new subscribers.
As always, we’re here to discuss the best posts, opinions, trends, and podcasts in the world of experimentation.
This Week in Experimentation:
Blueprint of the week: Strategic Testing Roadmap — a culmination of research and the basis of a great OKR-style action plan for a testing program. Link.
Talk of the week: Heuristic Review of WHOOP — Join Emma Travis, Paul Randall, and Kristel Ots as they run a heuristic analysis of WHOOP(com). Link.
Read #1: The Funnel Technique in Qualitative User Research — asking broad open-ended questions before gradually introducing more narrowly-scoped open-ended questions, as well as closed questions. Link
Read #2: 18 Best A/B Testing Tools for 2023, Reviewed by CRO Experts — Link
Read #3: How to Gather ‘Voice of Customer’ Behavioral Insights — help to overcome visitor confusion, frustrations, and challenges with your website. Link.
Opinion of the week: Deborah O'Malley — Busting the myth of statistically significant double or triple-digit conversion rate increase. Link
Event of the week: Learn how to design, measure, and implement trustworthy A/B tests, live with experimentation expert Ronny Kohavi. Link.
Blueprint of the Week: Strategic Testing Roadmap
A strategic testing roadmap is the culmination of research and the basis of a great OKR-style action plan for a testing program.
- The boxes represent insights coming from triangulated research data (quant and qual).
- Some are strategic, some tactical, but overall, it's a punch list. The 'Key Results' part of the OKR.
- The objective part is framed as a powerful 'how might we...' question. And this question isn't determined ahead of research, it comes from the research itself, after coding the insights and finding problem/opportunity patterns.
- And look at the KPIs! The specific ones aren't important, but now we can make the goal SMART.
- We go through the punch list of insights and watch the needle move on those goal-associated KPIs. Even create a problem theme metric dashboard... all tests related to this theme get reported out via that dashboard (quickens the reporting/decision part of the flywheel btw).
Use Cases:
- creating a research-based strategic roadmap for a testing program
- communicating with your team on objectives and key results for a test program
- organizing tactics and strategies against research and metrics
https://speero.com/blueprints/whats-in-a-strategic-testing-roadmap
Talk of the Week: Heuristic Review of WHOOP
Join Emma Travis, Paul Randall, and Kristel Ots as they run a heuristic analysis of WHOOP(com), a fitness watch eCommerce website.
Speero uses Miro Board to run group-based heuristic workshops. This way, we can assess User Experience and gathers lots of insights from lots of different people.
Handy color coding makes it easy to label insights pertaining to different heuristics around motivation, relevance, clarity, value, friction, distraction, and trust.
Once coding is complete, the key themes or problem areas are easier to summarize by looking at the overall color mix on the board.
Reads of the Week:
Read #1: The Funnel Technique in Qualitative User Research — Link
The funnel technique has been around since qualitative interviews emerged as a research method.
This technique involves asking broad open-ended questions before gradually introducing more narrowly-scoped open-ended questions, as well as closed questions.
This idea of starting broadly before getting more specific is valuable in other types of studies besides user interviews. This technique can help you organize:
— Interview Questions
— Followup questions
— Usability tasks
— Research in a multimethod study
This article from NN Group discusses how the funnel technique can be used in user interviews and in moderated usability tests.
Read #2: 18 Best A/B Testing Tools for 2023, Reviewed by CRO Experts — Link
A/B testing is no longer a new field. Finding a proper A/B testing tool isn’t the problem anymore. Now, the problem is choosing the right one.
You could hire someone full-time to sift through and analyze the pros and cons of each tool, but it’s easier to read a blog post based on the experience of others.
We’ve assembled a list of the most popular A/B testing tools and corresponding reviews from A/B testing experts to help you in your decision-making process.
Read #3: How to Gather ‘Voice of Customer’ Behavioral Insights — Link.
What are the challenges and barriers your visitors' experience when they land on your website?
— Do they get confused?
— Do they get stuck?
— Do they find what they need?
By the end of this short read, you should have a structured understanding of the processes involved in gathering a true perspective of the strengths, challenges and opportunities within your user experience journey.
Opinion of the Week:
“Yet another A/B testing myth. Busted! 💥
⛔ Myth: A statistically significant double or triple-digit conversion rate increase is a huge reason to celebrate. 🎉 The bigger the conversion uplift, the better! 🥳
✅ Reality: The bigger the conversion lift, the less likely the result is actually accurate. 🧨 The more likely it is the effect is exaggerated because the test was underpowered. 🧨
💡 Explanation: Results can appear statistically significant even with a very low sample size.
For example, if you run a test with just 10 users, traffic is equally split, and 1 user converts to version A and 4 users convert to version B, the result is statistically significant.
Version B's conversion rate is an incredible 300% higher!
But, every experimenter knows, these findings are not very trustworthy. The sample size is too low to provide meaningful results.
The study is underpowered. It doesn’t contain a large, representative sample of users.
Because the test is underpowered, the results are prone to error.
And it’s not clear if the outcome occurred just by random chance or if one version is truly superior.
While this example is extreme, statistically significant, low-powered tests happen more than most experimenters would like to admit.
In fact, it's estimated 1/20 of tests show statistically significant results when, in fact, there is no real difference between the control and treatment!
On top of this fact, it's also rare to see properly-powered tests that lift conversion by more than a few percentage points.
The outcome is a lot more likely to be the result of a poorly designed, underpowered experiment.
This phenomenon is known as the winner’s curse. 🙇
Statistically significant results, from under-powered experiments, exaggerate the lift, leading experimenters to believe they have outstanding results when they really don't.
If implemented, the test can actually drag down conversions. 🤕
The apparent win is a curse that becomes more worthy of a cry than a celebration. 😢
Remedy:
If you get lifts that look too good to be true, they probably are.
We're naturally biased to want to celebrate positive results and cast-off negative ones, but it’s important to run all experiments with healthy skepticism.
To overcome this bias:
- Do sample size and power calculations AHEAD of running the study, based on a reasonable MDE of 2-5%
- Stick to those numbers from start to end to ensure your study is adequately powered
- Replicate the test if results look suspicious
- Don't celebrate wins, brag about results, or implement the test until you know, for certain, results are truly trustworthy”
Link to original LinkedIn post.
Event of the Week: Accelerating Innovation with A/B Testing
Learn how to design, measure, and implement trustworthy A/B tests, live with world-leading experimentation expert Ronny Kohavi.
— 5 live sessions: Jan 30, 31, Feb 2, 6, 9 (8-10 am PST)
— Lifetime access to session recordings, resources & community
— Expense to your L&D budget: USD $1,800 per seat (course starting Jan 30th)
Learn:
— The motivation and basics of A/B testing (e.g., causality, surprising examples, metrics, interpreting results, trust and pitfalls, Twyman’s law, A/A tests)
— Cultural challenges, humbling results (e.g., failing often, pivoting, iterating), experimentation platform, institutional memory, meta-analysis, ethics
— Hierarchy of evidence, Expected Value of Information (EVI), complementary techniques, risks in observational causal studies
Another email finished.
See you next week.
If you want me to share your podcast or blog post, just reply to this email or reach out to me personally at Martin@speero.com
PS — Think anyone could use all this info? Share the newsletter with them!