AI Evals
Concept and tutorial chapters, ordered by reader progression.
The most useful, opinionated reference for AI evals.
AI Evals is a curated technical reference for evaluating large language model systems in production. The site covers error analysis, LLM-as-judge calibration, RAG evaluation, agentic eval, statistical rigor, and twenty task-specific eval playbooks. Every claim is cited.