Ashley Stirrup

image published 2026-02-25 · Open on LinkedIn ↗

I just watched a terrific presentation by Kelli H. at Khan Academy at Experimentation Island yesterday. They are doing amazing things with GrowthBook to A/B test their AI tutor — and the results are remarkable. Khan Academy's Khanmigo is an AI-powered tutor used by millions of students. The mission: maximize real learning, not just engagement. But here's the problem with optimizing an AI tutor — the same prompt can produce a dozen different answers. A tiny change to a system instruction can send outputs haywire. You can't ship and hope. So they turned to A/B testing with GrowthBook. They now run continuous experiments on Khanmigo, testing prompts, model changes, and new features — all wired directly into their data warehouse. And the results have been game-changing. One example: they gave Khanmigo a calculator to help with math. Accuracy improved, but latency ballooned. So they started testing their way through it: → Remove the calculator: math errors doubled. Roll it back. → Try GPT-5: math errors still doubled. Roll it back. → Tighten the agent's instructions: 3 seconds of latency saved. Accuracy stable. → Faster model: 300ms more. Still stable. → Time-box execution: even better. Kelli's term for it: "hill climbing." You don't know how far you have to go. You just keep testing your way to better. She also talked about a cultural shift. Product teams used to see A/B testing as an obstacle to shipping. Now, with AI features that can break unpredictably, GrowthBook is their safety net. Teams are asking for it. This is how you responsibly optimize AI at scale. Kelly Wortham, Ton Wesseling

Likes

Comments

Impressions

164

from LinkedIn export

Engagement over time

Only one snapshot so far — the engagement-over-time curve appears once the daily scrape has captured this post at least twice.