AI Evals
Methodology pages, deeper than the learn chapters.
The most useful, opinionated reference for AI evals.
AI Evals is a curated technical reference for evaluating large language model systems in production. The site covers error analysis, LLM-as-judge calibration, RAG evaluation, agentic eval, statistical rigor, and twenty task-specific eval playbooks. Every claim is cited.