Product managers shipping AI features write PRDs against non-deterministic systems, brief customer-facing teams on what "good" means when the model varies, and own the decision to ship or hold a release. This track is product-thinking. it ships templates (PRD section, OKR examples, customer FAQ snippets), worked examples of real quality bars, and the vocabulary you need with both customers and engineers. It is light on math and heavy on conversation patterns.
The first three steps build the shared language with engineering and the domain expert. Steps four through six are the planning artifacts (acceptance criteria, OKRs). The last three are operational (beta, ship/hold, reading list).
The track
- PRD section template: AI quality bar (12 min). The standard PRD section for AI features. Acceptance criteria expressed as evals, not as feature checklists. The argument for why this section is non-negotiable 1.
- Talking to customers about non-deterministic AI (12 min). The vocabulary your customers can use. What "good" means when the model varies. The honest explanation of why two identical questions can produce two slightly different answers.
- Working with your domain expert (14 min). The principal domain expert pattern. How to recruit, brief, and run weekly sessions. Notion AI's "AI Data Specialist" role is a worked example 2.
- Writing scenario-level acceptance criteria (12 min). Feature times scenario times persona. The dimensional coverage framework that turns "the bot should be helpful" into a checkable list.
- Reading an eval report as a PM (14 min). What the columns mean. TPR, TNR, error bars, segment slicing. Where to push back on the engineering team's headline number.
- Setting AI OKRs that aren't gameable (12 min). Quality OKRs anchored to user outcomes, not model metrics. Hamel's revenge-of-the-data-scientist argument applied to roadmap design.
- Beta program design for AI features (14 min). Sampling, instrumented onboarding, feedback loop, holdback group. The flywheel framing for the rollout 3.
- Quality regressions: when to ship vs hold (12 min). The decision framework. The role of error bars in the call. Why "trending down" by two points may be inside the noise band.
- PM reading list (13 min). Three blogs (Hamel, Eugene Yan, Applied LLMs) and one paper (Anthropic's Adding Error Bars). The bookmarks bar starter set 4.
What comes after the track
Once the nine pages are read, the natural next moves: write the AI section of your next PRD; recruit a domain expert and book the weekly session; draft the beta program for the next feature. The Hamel field guide is the single best deep-read for a PM on what good looks like at a real team 5. The LLM-judge essay is the right read on critique shadowing 6.
TIP
The PRD template (step 1) and the domain expert pattern (step 3) are the two highest-impact artifacts a PM brings to an AI feature. Everything else builds on them.