Error Analysis

Open coding, dimensional sampling, and the 60-80% rule. Also available as a free PDF download.

Error analysis is the meta-loop, not a stage. Rubrics and judges are downstream of it. Before you measure anything reliably, you read traces and write down what is wrong; that reading is what tells you what is worth measuring, and the read happens on a cadence, not once. Hamel's "look at your data, then look again" credo is the entire job description; everything else on this site is in service of it.

The chapters here cover the practice. The 60-80% rule names the prior that makes the work tractable: three failure modes usually account for most of your bugs. Open coding is the method that finds those modes. Dimensional sampling is the way you build inputs that cover the corners. Failure-mode taxonomies are the structured outputs of open coding. The trace viewer is the single highest-return tool you build to support the loop, and the NurtureBoss case study shows the loop closing in production. The single-page operational summary is error analysis (open coding).

Chapters:

The 60-80% rule. Why three failure modes usually account for most of your bugs, and how to find yours this week.
Open coding for AI traces. The three-stage process (open, axial, structured) for turning notes into a labeled dataset.
Dimensional sampling. Synthetic input construction by features by scenarios by personas.
Failure-mode taxonomies. Bottom-up coding beats top-down templates, and why generic labels mislead.
Build your trace viewer in an afternoon. The single highest-return tool for an eval program. Streamlit and Next.js variants.
Case study: NurtureBoss. One team's path from 66% date errors to 5% in one error-analysis cycle.