← Posts

GrowthBook

text published 2026-03-24 · Open on LinkedIn ↗

Khan Academy gave their AI tutor a calculator to improve math accuracy. It worked, but it made responses painfully slow for students. So they ran five sequential A/B tests: ✅ Removed the calculator (math errors doubled) ✅ Switched to GPT-5 (accuracy still suffered) ✅ Tightened the agent's prompts (latency dropped 3 seconds) ✅ Upgraded the agent's model (another 300ms off) ✅ Time-boxed execution (more gains, accuracy stable) Without experiments, they might have shipped the first iteration and unknowingly made tutoring worse. That's the whole case for A/B testing AI features in one example. Kelli H., Senior Director of Data Insights at Khan Academy, shared this and more at Experimentation Island. She's joining us for a live webinar on April 16 to go deeper. Register for the April 16 webinar with Kelli: https://lnkd.in/giD2HuCN Blog recap: https://lnkd.in/gcxx2Tdt

Likes
10
Comments
1
Shares
0
Impressions
1,041
from LinkedIn export
Clicks
17
Rates
2.69%
engagement · 1.63% CTR

Engagement over time

Only one snapshot so far — the engagement-over-time curve appears once the daily scrape has captured this post at least twice.