No shared definition of what counts as a hypothesis worth testing — anything with a hunch becomes an experiment.
Run fewer experiments. Learn more. Decide faster.
The problem with most growth experimentation isn't volume — it's that the team is running 12 tests at once with no decision rules, no statistical discipline, and no clean handoff into the operating system.
What this problem looks like
If you recognise three of these, this page is for you.
- The team runs experiments but rarely scales or kills any of them — every result becomes "interesting".
- There's no central backlog; experiments live in five Notion docs and one Google Sheet.
- Tests run for 5-7 days then get called regardless of significance, because the team needs to ship the next thing.
- Nobody can confidently name the three biggest learnings from the last quarter.
- The same hypothesis gets re-tested every 6 months because nothing was documented.
Why it usually happens
The root cause is rarely what the team thinks it is.
No prioritisation framework, so the loudest voice or the easiest test wins.
No decision rule before the test starts — "we'll see what the data says" makes every result ambiguous after the fact.
How I diagnose it
A focused diagnostic, not a six-week consultancy review.
- 01Audit the last 90 days of experiments: which had hypotheses, which had pre-set decision rules, which were actually statistically valid.
- 02Read the team's experiment doc / Notion / sheet for evidence of what was scaled, killed, or shelved — and how the call was made.
- 03Score each test on whether it produced a decision the team acted on (the only metric that matters).
- 04Find the three biggest unanswered questions about acquisition or activation that no current experiment is addressing.
- 05Prioritise the next 6-8 experiments using ICE or similar — but force the team to write the decision rule before launch.
How I fix it
Build the system, then transfer it.
- 01Rewrite hypotheses in a strict format: "If we change X for users in segment Y, conversion will move by Z, because [insight]."
- 02Set decision rules before launch: what result scales it, what kills it, what triggers another test.
- 03Build one experiment backlog (not five) with status, owner, success criteria, and outcome.
- 04Set a weekly experiment review: 30 minutes, decisions captured, no debates about "interesting" results without action.
- 05Document the win and the kill — both feed the team's institutional memory and stop re-testing dead ideas.
Example deliverables
What you actually leave with.
Hypothesis template (one-pager) the team uses for every test
ICE-scored experiment backlog with named owners
Pre-set decision rules per test (scale / kill / iterate)
Weekly experiment review format and decision log
Quarterly learnings doc — wins, kills, abandoned hypotheses
Sample-size calculator (or Optimizely / VWO setup if needed)
Mini example · B2B Fintech · paid acquisition
- Problem
- Team was running 14 paid landing-page variants with no clear hypothesis, no decision rule, and no documented winner. Two new tests went live each week and nothing got scaled.
- Action
- Cut the backlog to 4 prioritised tests with pre-set decision rules. Standardised the hypothesis format. Added a weekly review where every test got a verdict.
- Result
- CAC dropped 35% in 8 weeks because the team finally scaled the winners and killed the losers — same volume of tests, dramatically more decisions.
Who this is for
Best fit if any of these apply.
- Post-PMF teams running real experimentation but feeling like the learning isn't compounding.
- Marketing or product leaders who want a defensible quarterly learnings narrative for the board.
- Founders who suspect the team is busy but not learning fast enough.
Common mistakes
What teams get wrong before they call.
- Running A/B tests on traffic too low for statistical significance, then making decisions on noise.
- Calling tests early because the team needs to ship, then re-running the same test six months later.
- Confusing dashboards with learning — vanity metrics climb, decisions don't.
- No documentation of kills, so the same dead hypothesis gets retested annually.
FAQ
Common questions before booking.
Do we need a stats tool?
Eventually. For most Seed-Series A teams, a simple sample-size calculator and a disciplined hypothesis template do 80% of the work. Stats tools are a 6-month conversation, not week one.
What if our traffic is too low for statistical significance?
Run sequential / Bayesian-style tests, qualitative interviews, or in-funnel-event analysis. Sample-size constraints are real — pretending otherwise is the bigger problem.
How is this different from CRO?
CRO is one input. This covers the whole experimentation system: hypothesis quality, prioritisation, decision rules, and institutional memory across product, marketing, and growth.
Should we hire a CRO specialist?
Maybe — but later. The system needs to exist before a specialist plugs in. Otherwise the specialist runs experiments inside the same broken framework.
Diagnose this in 20 minutes.
Bring the current state of your experimentation. We'll diagnose the constraint and decide if working together makes sense — or where else to go if it doesn't.
Last updated: 11 May 2026