CTOs and directors of technology own the risk story, not the implementation detail. This track is executive-grade. it is dense but every page leads with a one-line TL;DR, every regulatory citation has an effective date, and every claim is sourced. The ten steps move from the board-level case for the program to the actual board readout template you reuse.
The first five steps build the strategic posture. The last five give you the artifacts your team produces to support it. Read in order on the first pass; come back to specific steps when the board asks a question.
The track
- Executive overview: why evals are board-level (12 min). The case for the eval program as defensible risk posture, not engineering hygiene. The non-determinism, subjectivity, and drift arguments framed for non-engineers 1.
- EU AI Act timeline (16 min). GPAI obligations live as of August 2, 2025; full applicability August 2026. The document you actually file 2.
- NIST AI RMF mapped to eval activities (16 min). The four functions (Govern, Map, Measure, Manage) cross-walked to specific eval evidence. What goes in an audit binder 3.
- OWASP LLM Top 10: your exposure (14 min). Each item: what it is, how evals detect it, what to file in the risk register 4.
- What frontier-lab evals look like (14 min). Anthropic's Responsible Scaling Policy 5 and OpenAI's Preparedness Framework 6 are the vocabulary your peers use and the standard your investors and customers will reference.
- Building an AI risk register (12 min). The template, scoring methodology, and how each row links to a product-risk artifact.
- Customer-facing trust artifacts (12 min). Trust portal page content, model card framing, what to share publicly and what to gate behind an NDA.
- Red-team program design (14 min). Internal vs external, cadence, scope. What goes in the contract with an external red-team vendor.
- Eval spend benchmarks (14 min). What teams of your size spend on eval headcount, vendor licenses, and compute. The three reference architectures (OSS, hosted, hybrid) with cost ranges.
- Talking to your board about AI risk (16 min). The board-readout template: one-pager, risk matrix, mitigation-status table. What to leave out.
What comes after the track
Once the ten pages are read, the natural next moves: write the AI risk register for the current quarter; pick the trust artifact rollout date; commission an external red-team for the next major release. Anthropic's writeup on the challenges of evaluating AI systems is a useful supplement on what frontier-lab honesty looks like 7.
CAUTION
The EU AI Act timelines are real. If your product touches EU users, the GPAI obligations are live and the full-applicability date is firm. Steps 2 and 3 are not optional; they are the audit-binder material your legal team will request.