GrowthBook

text published 2026-01-27 · Open on LinkedIn ↗

AI Evals tell you if the model works. A/B tests tell you if users care. These answer different questions. Evals measure competence—can it do the job? A/B tests measure value—do users actually behave differently? DoorDash saw a model that looked great in testing drop 4.3% in accuracy with real users. The test data was too clean. The teams shipping GenAI reliably treat these as stages: evals catch regressions early, shadow mode tests real traffic, A/B tests prove business impact. You can't A/B test a broken model. You can't eval your way to product-market fit. Wrote up the full pipeline: https://lnkd.in/gc_yVfYw

Likes

Comments

Impressions

540

from LinkedIn export

Clicks

Rates

4.63%

engagement · 3.33% CTR

Engagement over time

Only one snapshot so far — the engagement-over-time curve appears once the daily scrape has captured this post at least twice.