Is the result you found in a test repeatable?
THIS is the gold standard of scientific research. 3 Methods for Confirming Test Effects Blueprint provides the most common methods of cross-validation.
Note that holdouts can be difficult to maintain, results also have to be accurate and there can be reliability issues there. It also sacrifices the solution benefit while running. On the other hand, continuous holdouts can lose attribution of any false positives, but they are easier to maintain.
The first alternative is flip tests, where you implement the winner and then rerun the test by removing the winner. Flip tests are probably the easiest to implement and most common to use, especially on a test-by-test basis. But, they got a burning question inside. What if it loses?
Going backward can sometimes erode trust in your program. For example, when it gets flat, loses, or gets a different type of result. But this is a part of flip tests. If you’ve got a good program and experimentation culture to handle that, you’ll be fine.
The last solution is time series and moving averages. The point of time series and moving averages is that you implement a test and see what happens over time. But you gotta be careful. There are lots of confounding variables here. You can try using the GA effect tool that allows you to do this more academically
Use Cases:
- Report on the ROI of a test initiative or group of tests.
- Be extra confident in your test result.