GrowthBook

text published 2026-03-24 · Open on LinkedIn ↗

Khan Academy gave their AI tutor a calculator to improve math accuracy. It worked, but it made responses painfully slow for students. So they ran five sequential A/B tests: ✅ Removed the calculator (math errors doubled) ✅ Switched to GPT-5 (accuracy still suffered) ✅ Tightened the agent's prompts (latency dropped 3 seconds) ✅ Upgraded the agent's model (another 300ms off) ✅ Time-boxed execution (more gains, accuracy stable) Without experiments, they might have shipped the first iteration and unknowingly made tutoring worse. That's the whole case for A/B testing AI features in one example. Kelli H., Senior Director of Data Insights at Khan Academy, shared this and more at Experimentation Island. She's joining us for a live webinar on April 16 to go deeper. Register for the April 16 webinar with Kelli: https://lnkd.in/giD2HuCN Blog recap: https://lnkd.in/gcxx2Tdt

Likes

Comments

Impressions

1,041

from LinkedIn export

Clicks

Rates

2.69%

engagement · 1.63% CTR

Engagement over time

Only one snapshot so far — the engagement-over-time curve appears once the daily scrape has captured this post at least twice.