THIS is the gold standard of scientific research. 3 Methods for Confirming Test Effects Blueprint provides the most common methods of cross-validation.
Note that holdouts can be difficult to maintain, results also have to be accurate and there can be reliability issues there. It also sacrifices the solution benefit while running. On the other hand, continuous holdouts can lose attribution of any false positives, but they are easier to maintain.
The first alternative is flip tests, where you implement the winner and then rerun the test by removing the winner. Flip tests are probably the easiest to implement and most common to use, especially on a test-by-test basis. But, they got a burning question inside. What if it loses?
Going backward can sometimes erode trust in your program. For example, when it gets flat, loses, or gets a different type of result. But this is a part of flip tests. If you’ve got a good program and experimentation culture to handle that, you’ll be fine.
The last solution is time series and moving averages. The point of time series and moving averages is that you implement a test and see what happens over time. But you gotta be careful. There are lots of confounding variables here. You can try using the GA effect tool that allows you to do this more academically
- Report on the ROI of a test initiative or group of tests.
- Be extra confident in your test result.
Problem-Statement Focused Hypothesis Blueprint helps you ground experiment ideas (or solutions) in research, utilizing 'problem statements' as the bridge. This enables you to ensure your tests focus on problem statements, which are grounded in research and allow for alternate 'solutions' to be proposed as long as they are both grounded in the same hypothesis (and problem statement).
Let’s say you have a concrete, tactical test idea. With this framework, you can put this idea into the solution part, and then find your hypo and if statement (from that idea). What do you believe will happen if you implement your idea? Now, it’s time to take a step back. What is your problem statement? Where is the evidence that your test idea is really a problem? Most of the time, you can back this in research.
You can also link these problem statements to the business. Use them as an opportunity to understand what your business is trying to prioritize. This way, when you present in front of the leadership, you all can collectively agree on which three problems should be addressed first, instead of having a bunch of solutions backed by hypotheses.
- Prioritize your tests based on business needs.
- Connect your solutions to business problems.
- Get buy-in for experimentation.
- Focus on the most important user problems.
RXL Blueprint is a research method for identifying the key barriers to conversion and key customer problems within UX. It is a really strong foundation for any experimentation program. Whether you’ve been testing for years or you’ve never done a test before, or for anyone in between, RXL provides us with a deep understanding of what really matters to your customers and onsite users. This way, you can design tests with impact.
Perhaps you’re struggling with testing lots of random things or your stakeholders are asking you to test a lot of random things. ResearchXL helps you move away from this random approach and base your testing decisions on user data. Now, you have an alternative to stakeholders asking about key customer problems or doing random tests. Ultimately, with RXL you will understand your customers a lot better, with clear benefits for your company.
- Plan UX research.
- Structure your testing and back it in research.
- Identify and classify the fears, frustrations, and motivations your users experience.
Multi-Armed VS A/B Testing Blueprint is a guiding tool on when to run a multi-armed or a true A/B test. A/B testing allows for a more statistically controlled learning environment, while MAB is more focused on generating a win as quickly as possible (at the sacrifice of understanding 'why'). MABs are good for holiday, short-term, and seasonality testing, while the A/B test provides a deeper insight into what went good or bad in your tests.
- Decide if and when to use MAB or AB.
This is an example of a workflow map for an A/B test. The different steps right before a test goes live, during the test, and afterward. Each step can be customized based on your organization's structure and needs and can be more granular or less granular.
• Create a workflow map that lays out the rules and steps for setting up a test, the steps while the test is live and stopping decisions, post-test analysis flows.
• This document would be used across the teams that run A/B tests across your company, or if you are an agency you want to map one out for each client individually.
• This would list out steps, tasks, and decisions. For example, if something is not working, what is the fallback.
Sample Ratio Mismatch Alert! What do I do?You have set the ratio for your test at 50/50 and your testing tool has reported that you have a 70/30 traffic split in your experiment. This can not be trusted and needs to be rectified if this experiment's data is to be analyzed.There are several ways that your experiment can have a sample size mismatch, but the most common will be technical issues with the segmentation itself.Use this framework as a guideline to determine what is the best next step once you have received a sample ratio mismatch error:
You have received a Sample Ratio Mismatch (SRM) error from your testing tool or analytics team.
The CRO Process blueprint is your strategic approach for identifying and interpreting relevant data to find possible points of friction in your sales funnel. And ultimately, increase the conversion rate. This blueprint shows you all the parts your CRO process should have, including which role should be responsible for which part of the process.
- Structure your CRO Process.
- Improve conversion rate.
- Help everyone understand their place.
- Increase website ROI.
When does it make sense to continue iterating because there’s still some juice left to squeeze? The Iterate Vs Move framework deals with this.
This is a great visualization of how iterations are important to experimentation, but can sometimes be deprioritized if other initiatives take precedence. Remember, testing and iterations are costly. Use this blueprint as a starting point in decision making whether to iterate vs. move on to the next test, and build this into your own prioritization framework.
- Determine before the test if the iteration costs too much.
- Decide between iterating and moving to a new hypo.
Our biggest and boldest blueprint. The Test Phase Gate Blueprint deals with the test’s phases, stages, and process itself. It helps you ask vital questions to yourself and your team as your experiment goes through different stages. The Test Phase Gate Blueprint is at the heart of experimentation program management.
This is a BEAST of a blueprint, not one to look at every week, but one to help check and balance how your Gates are working (or not), what questions, activities, deliverables, and use cases, are for each gate, and more. It puts a LOT of things into context. The blueprint also references the artifact in terms of building a test document and having all of those pieces sorts of stack up with each other.
The gates represent experimentation programs or parts of their flywheels which, once you turn them on, become one-way gates with no turning back. Check out the Miro board below in ‘related links’ for more info.
- Manage your experimentation program.
- Determine the cadence and flow of your experimentation flywheel.
- Use as a communication tool to align the team on how things work.
- Understand the artifacts (test documentation) and how they fit in with each other.
- Understand the roles, activities, deliverables, and responsibilities in the flywheel.
Don’t just straight to yes. Not all changes are good. Especially if they involve changing the process for several teams or serious stakeholder buy-in. Test Implementation Checklist Blueprint gives you a plan for attack when you have to change ‘business as usual’.
More than anything, this is your ‘flicker’ mechanism. Something you check on every once in a while and ask yourself a series of questions about the technical feasibility and buy-in you need even before the test is scoped.
- Have a plan of attack when changing the status quo.
- Guesstimate how much buy-in and technical feasibility you need to implement the test.
- Determine if the test is worth implementing before you present it to the stakeholders.
One of the most challenging questions we hear from companies looking to increase experimentation maturity is how can they better build their teams and distribute their experimentation capabilities within the org. There's obviously not a single answer to this question, but the Org Charts blueprint will present you with the most common examples we see in real life. Org Charts Blueprint is inspired by Stephan Tom's “Experimentation Works" book, which is a great reference on this topic.
- Structure the experimentation capacity and capability within the organization.
- Structure experimentation teams.
- Establish the responsibilities of the team members for increased efficiency.
- Understand the pros and cons of each org structure.
When you start assembling an experimentation team, either by hiring new people or just asking people already part of the company to support your experimentation activities, it is easy to struggle to define who's expected to do what in a specific activity. Who's responsible for this and who's accountable for that? So the idea behind introducing an Experimentation Program RASCI Matrix Blueprint, which is a known tool in the program management world, is to visually represent:
— Who's responsible?
— Who's accountable?
— Who supports?
— Who's consults?
— Who's informed about each one of these activities?
— Structure experimentation teams.
— Establish the responsibilities of the team members for increased efficiency.
— Structure the experimentation capacity and capability within the organization.
When running experiments systematically, it is easy to start just testing for the sake of testing and missing the point of the actual actions and changes we wanna drive through experimentation. That’s why Experimentation Decision Matrix Blueprint provides you with an action plan before you start any testing.
Based on your primary success metric —and supported by the secondary ones— what's going to be your action after concluding the test? This way, you can remind not only you and your team, but your stakeholders as well about the actual actions you plan to take after you conclude each one of your experiments. The suggestion here is just to implement a set of action tags in your knowledge database so that you can classify and take action on these results.
This framework will give you a reference on how to tag and classify your results depending on the type of hypothesis (superiority vs non-inferiority) and the action to take afterward. It will also help you enrich a narrative where “winning” and “losing” is not necessarily what’s impactful for the business.
- Decide whether to implement, iterate or abandon a hypothesis.
- Update your experimentation roadmap based on prior results.
- Classify the experiments in your Knowledge Database.
The Validation Methods Blueprint helps you decide which validation method to use when you’re doing website changes or when you need to defend validation tools for copy functionality. Shipping out big changes on the site without prior validation can be very costly and even create more problems than it solves. Disruptive ideas should be tested or validated with further research before releasing them to the public.
While AB testing can be used to get quantitative data on how your changes are performing on the site, sometimes you might want to validate different ideas before you use your resources to build a new test, if you're not confident in the current solution. In some situations, you might not even have the bandwidth or resources to validate the idea with AB testing alone. Luckily, there are numerous options out there to get additional qualitative or quantitative data to validate your ideas.
- Decide which validation method to use when doing site changes
- Get visibility to different validation tools for copy, functionality, design, etc.
Randomized controlled trials are known from medical scientific research. But since 2000 this scientific approach is also being used to improve and learn from internet pages. The research is double-blind (the researcher and visitor don’t know they are in the experiment, there is a hypothesis that is ‘proven’ until it’s falsified.
- prove that A/B-testing is a scientific research method
- show history behind experimentation
The Program Metrics Blueprint lets you monitor the success of your experimentation program. Why is this important? Besides tracking revenue metrics like the number of wins or losses from the experiments, you should also look at program metrics to report back to on a monthly or quarterly basis.
If you're in a situation where you are in charge of running experiments on the site but are facing some slowdowns or issues in efficiency, you need to think about the relevant metrics that will help you identify the bottlenecks in your program.
For example, if the test velocity has been going down, you can identify where the problems lie by reviewing how many ideas are being submitted every month, or maybe even how many ideas are in the backlog. If idea generation is not the problem here, you should be tracking how long test creation is taking for the team by looking at time and sub-tasks.
- Increase your testing velocity.
- Monitor the success of your experimentation program.
- Improve the efficiency and effectiveness of your program.