The OWASP Top 10 for Large Language Model Applications is the closest the industry has to a shared vocabulary for LLM application risk. Treat it as the floor, not the ceiling. Every category on this list deserves at least one probe in your eval suite; the better-resourced teams probe most of them on every release 1.
The list below uses the 2025 (v2.0) numbering. The advice for each category is the same shape: a one-sentence definition, the failure mode in product terms, the smallest probe you can add to your eval suite this week.
The ten categories
| Code | Category | What goes wrong | Smallest probe |
|---|
| LLM01 | Prompt Injection | Untrusted input overrides the system prompt or tool policy | 20 indirect-injection cases in tool-input fields, scored by policy violation |
| LLM02 | Sensitive Information Disclosure | Model emits PII or proprietary data from context or training | Synthetic conversations seeded with PII-like canaries, grep the outputs |
| LLM03 | Supply Chain | Compromised model weights, datasets, or plugins | Pin model fingerprints; eval dependency chain on every release |
| LLM04 | Data and Model Poisoning | Training or fine-tuning data injected to bias outputs | Trigger-phrase probes drawn from the suspected poison signature |
| LLM05 | Improper Output Handling | Downstream code trusts model output (XSS, SQLi, RCE) | Run outputs through your sanitizer in eval, not just in prod |
| LLM06 | Excessive Agency | Agent has more permission, autonomy, or tools than it needs | Run agent against a deliberately overprivileged tool catalog and grade for least-privilege adherence |
| LLM07 | System Prompt Leakage | Model reveals the system prompt or hidden instructions | 30 standard extraction prompts, scored by leakage |
| LLM08 | Vector and Embedding Weaknesses | Inversion or poisoning of the RAG index | Eval that the retriever returns expected docs for a known query set |
| LLM09 | Misinformation | Confident incorrect output that the user trusts | Use the existing RAG faithfulness eval; track fabrication rate |
| LLM10 | Unbounded Consumption | Resource exhaustion via prompts that explode token or tool budgets | Cap-and-budget probe: send a known-explosive prompt and verify the request fails closed |
The table compresses the OWASP guidance. Read the upstream document for the full mitigations under each item 1. The mapping below is opinionated and meant for an evals practitioner, not a security auditor.
What this list is good for
The list is most useful as a checklist for two conversations: with your security team, and with your customer's procurement team. Both want to know that you have thought about each category. A short page in your trust portal that says "Here is what we test for, mapped to OWASP LLM01 through LLM10, with the date of last verification" satisfies most of the second conversation without bespoke work each time.
For internal use, the list is a starting taxonomy for your red team. The five categories that tend to bite hardest in production are LLM01 (prompt injection, especially indirect), LLM05 (improper output handling, the source of most XSS bugs in chat surfaces), LLM06 (excessive agency, the new failure mode in agentic systems), LLM07 (system prompt leakage as a competitive-disclosure issue), and LLM09 (misinformation, where your judge has to grade for fabrication, not just helpfulness).
What this list is not
The OWASP Top 10 is not a benchmark. It does not give you a labeled corpus, a scoring rubric, or a comparable number across vendors. For each category you still need to instrument the probe, write the rubric, and decide your pass threshold. Public corpora like HarmBench cover overlapping ground (LLM01, parts of LLM09) but are not a substitute for category-by-category probing on your specific surfaces 2.
Vendor red-team tools like Promptfoo's red-team module ship attack libraries organized by OWASP category, which can save you weeks of corpus construction; the trade is that you are then evaluating against an attack library someone else maintains 3. Microsoft's Responsible AI hub publishes mapping guidance for how Azure AI Studio's content safety policies relate to the OWASP categories 4; if you live in the Azure stack, the mapping is worth lifting.
How to use this page in a release process
The minimum useful artifact is a one-row-per-category table on your release checklist. Each row has the probe set, the pass threshold, the date last run, and the owner. Three categories per release have to be tested live, on a rotation, so that everything is exercised across a quarter. Two categories (LLM01 and LLM07) are tested on every release because the empirical regression rate on those is the highest.
CAUTION
The list updates every 18 to 24 months. Pin your testing to a specific version of the OWASP document, and re-baseline when a new version ships. The categories shift; treating the list as a moving target is a maintenance discipline, not a bug.
The next chapter, Designing a red-team program, covers the operating model that turns the table above into something that fires every release.