Engineering managers shipping AI products own decisions that engineers do not: headcount, vendor selection, the postmortem template, the hiring rubric. This track is strategy-first. It is heavy on case studies (NurtureBoss, Notion AI, Replit), light on code, and assumes technical literacy without requiring you to read line-by-line implementation. The eight steps move from the conceptual ladder (where is your team?) to the operational artifacts you ship (hiring rubric, vendor checklist, postmortem template).
The first three steps are the case for the program. Read them before staffing the next quarter. Steps four through six are the operational practice. The last two are about scaling the team beyond the founders.
The track
- The eval maturity model (14 min). Five stages from vibe-checking to continuous quality. Self-assessment checklist. The Shankar flywheel framing for the highest stage 1.
- The error-analysis ritual (16 min). Open coding for AI traces, the three-stage process, choosing the principal domain expert. The single weekly meeting that catches everything 2.
- Team shape and headcount patterns (12 min). Who actually owns the eval. The argument for product and domain experts in the loop 3. Notion AI's "AI Data Specialist" role is the worked example 4.
- Buy vs build eval platforms (14 min). The three reference architectures (fully OSS, fully hosted, hybrid). The decision flowchart by team size and stack.
- Postmortem template for AI incidents (12 min). What changes when the regression is non-deterministic. The eval-gap framing in the writeup.
- Quarterly quality review structure (14 min). The flywheel cadence: dataset health, judge calibration drift, trace sample, decisions made 1.
- Vendor evaluation checklist (16 min). Specific criteria: data residency, pricing per trace, multi-judge support, dataset versioning, OTel-native vs SDK-bound.
- Hiring rubric for eval-literate engineers (12 min). What to screen for: data fluency, statistical literacy, willingness to read 100 traces a week. The Applied LLMs essay is the field manual 5.
What comes after the track
Once the eight pages are read, the natural next moves are: build the case-study deck (NurtureBoss, Replit's 90% cost reduction via decision-time guidance 6); write the eval-program charter for the next quarter; pick a vendor and pilot one team on it.
TIP
The hiring rubric matters most. A team of strong engineers who refuse to look at traces will not ship a reliable eval program; a team of mid-level engineers who read 100 traces a week will.