Growth Experiment Designer
ReportTurns a funnel problem into a run-ready A/B experiment card
Live output preview
A plan is required to view this content
Choose a plan to access input format, sample outputs, and live previews.
View Plans →About the skill
What it does
It takes a funnel bottleneck and converts it into an A/B (or multi-arm) experiment card a growth/CRO team can run directly. The output is not a list of ideas; it is a single, runnable design with a measurable hypothesis, variants, sample/duration, OEC/guardrails and stopping rules.
The flow is anchored to named canon, not intuition:
- AARRR (Pirate Metrics, Dave McClure) places the problem on the correct stage first — separating whether "low conversion" is an Acquisition-quality or an Activation-design issue, so the wrong stage isn't optimized.
- North Star + Input/Output metric split (Reforge/Amplitude) selects the experiment's OEC (Overall Evaluation Criterion) from an input metric measurable within the run window; the lagging output is watched as a guardrail.
- Opportunity Solution Tree (Teresa Torres) ties every hypothesis to a behavioral cause (friction, low motivation, delayed value moment, trust gap) — rejecting the "let's change the button color" blind shot.
- ICE / RICE / PIE scores multiple hypotheses and picks one (parallel overlapping tests = dirty results).
- Kohavi/Tang/Xu — Trustworthy Online Controlled Experiments statistics derive MDE, power (0.80), α (0.05), sample size and run duration; the n ≈ 16·p̄(1−p̄)/δ² approximation answers up front whether the experiment has detection power. Under-powered designs are rejected.
- Guardrail + OEC + SRM (χ²) checks catch local-win/global-loss outcomes like "conversion went up but revenue/user dropped."
The design is finally scored 0–100 on an Impact/Consistency Score (scoring design quality, not the result — the test hasn't run yet) and given a verdict.
When to use
For situations like "activation is low / cart abandonment is high / trial→paid isn't converting / users drop in onboarding / which variant should I test / is this experiment statistically significant / how many days should I run." Input is typically a funnel step + observed loss; sometimes raw metrics, sometimes just a verbal complaint.
Method / frameworks
- AARRR — Pirate Metrics (McClure): stage placement.
- North Star Metric + Input/Output (Reforge / Amplitude): OEC selection.
- ICE / RICE / PIE (Sean Ellis "Hacking Growth"; WiderFunnel PIE): prioritization.
- Trustworthy Online Controlled Experiments (Kohavi, Tang, Xu, 2020): MDE, power, α, sample, guardrails, SRM, Twyman's Law.
- Opportunity Solution Tree (Teresa Torres): behavioral-cause discipline.
- Evan Miller — Sample Size Calculator / How Not to Run an A/B Test: peeking, fixed-horizon power (production-time verification tool).
Industry grounding: SaaS self-serve trial→paid typically ~3–12% (B2B median ~18–25%); e-commerce cart abandonment ~70%. Benchmarks are context, not a target — relative lift is set against the user's own baseline.
How do I use this skill?
Upload the growth-deney-tasarimci.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).
- Open Settings → Customize → Skills
- Upload → select the
growth-deney-tasarimci.zipyou downloaded - Claude reads
SKILL.md; the name + description appear. Ready ✅
Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.