SkillMachine
All skills

Growth Experiment Designer

Report

Turns a funnel problem into a run-ready A/B experiment card

Live output preview

Input Format: Input FormatOutputWatch the Output: Watch the Output

A plan is required to view this content

Choose a plan to access input format, sample outputs, and live previews.

View Plans →

About the skill

What it does

It takes a funnel bottleneck and converts it into an A/B (or multi-arm) experiment card a growth/CRO team can run directly. The output is not a list of ideas; it is a single, runnable design with a measurable hypothesis, variants, sample/duration, OEC/guardrails and stopping rules.

The flow is anchored to named canon, not intuition:

  • AARRR (Pirate Metrics, Dave McClure) places the problem on the correct stage first — separating whether "low conversion" is an Acquisition-quality or an Activation-design issue, so the wrong stage isn't optimized.
  • North Star + Input/Output metric split (Reforge/Amplitude) selects the experiment's OEC (Overall Evaluation Criterion) from an input metric measurable within the run window; the lagging output is watched as a guardrail.
  • Opportunity Solution Tree (Teresa Torres) ties every hypothesis to a behavioral cause (friction, low motivation, delayed value moment, trust gap) — rejecting the "let's change the button color" blind shot.
  • ICE / RICE / PIE scores multiple hypotheses and picks one (parallel overlapping tests = dirty results).
  • Kohavi/Tang/Xu — Trustworthy Online Controlled Experiments statistics derive MDE, power (0.80), α (0.05), sample size and run duration; the n ≈ 16·p̄(1−p̄)/δ² approximation answers up front whether the experiment has detection power. Under-powered designs are rejected.
  • Guardrail + OEC + SRM (χ²) checks catch local-win/global-loss outcomes like "conversion went up but revenue/user dropped."

The design is finally scored 0–100 on an Impact/Consistency Score (scoring design quality, not the result — the test hasn't run yet) and given a verdict.

When to use

For situations like "activation is low / cart abandonment is high / trial→paid isn't converting / users drop in onboarding / which variant should I test / is this experiment statistically significant / how many days should I run." Input is typically a funnel step + observed loss; sometimes raw metrics, sometimes just a verbal complaint.

Method / frameworks

  • AARRR — Pirate Metrics (McClure): stage placement.
  • North Star Metric + Input/Output (Reforge / Amplitude): OEC selection.
  • ICE / RICE / PIE (Sean Ellis "Hacking Growth"; WiderFunnel PIE): prioritization.
  • Trustworthy Online Controlled Experiments (Kohavi, Tang, Xu, 2020): MDE, power, α, sample, guardrails, SRM, Twyman's Law.
  • Opportunity Solution Tree (Teresa Torres): behavioral-cause discipline.
  • Evan Miller — Sample Size Calculator / How Not to Run an A/B Test: peeking, fixed-horizon power (production-time verification tool).

Industry grounding: SaaS self-serve trial→paid typically ~3–12% (B2B median ~18–25%); e-commerce cart abandonment ~70%. Benchmarks are context, not a target — relative lift is set against the user's own baseline.

How do I use this skill?

You don't "run" a skill — after installing it you just tell the agent your task (e.g. ask for the relevant job), and the skill kicks in by itself when its description matches.

Upload the growth-deney-tasarimci.zip you downloaded as-is — no packaging needed, the format is already correct (folder at root).

  1. Open Settings → Customize → Skills
  2. Upload → select the growth-deney-tasarimci.zip you downloaded
  3. Claude reads SKILL.md; the name + description appear. Ready ✅

Scripts run in Anthropic's code-execution environment (sandbox) — not on your machine.