Walking the OWASP Top 10 for LLM apps

The ten failure categories every LLM application owner should be probing, with the smallest useful test for each.

The OWASP Top 10 for Large Language Model Applications is the closest the industry has to a shared vocabulary for LLM application risk. Treat it as the floor, not the ceiling. Every category on this list deserves at least one probe in your eval suite; the better-resourced teams probe most of them on every release ¹.

The list below uses the 2025 (v2.0) numbering. The advice for each category is the same shape: a one-sentence definition, the failure mode in product terms, the smallest probe you can add to your eval suite this week.

The ten categories

Code	Category	What goes wrong	Smallest probe
LLM01	Prompt Injection	Untrusted input overrides the system prompt or tool policy	20 indirect-injection cases in tool-input fields, scored by policy violation
LLM02	Sensitive Information Disclosure	Model emits PII or proprietary data from context or training	Synthetic conversations seeded with PII-like canaries, grep the outputs
LLM03	Supply Chain	Compromised model weights, datasets, or plugins	Pin model fingerprints; eval dependency chain on every release
LLM04	Data and Model Poisoning	Training or fine-tuning data injected to bias outputs	Trigger-phrase probes drawn from the suspected poison signature
LLM05	Improper Output Handling	Downstream code trusts model output (XSS, SQLi, RCE)	Run outputs through your sanitizer in eval, not just in prod
LLM06	Excessive Agency	Agent has more permission, autonomy, or tools than it needs	Run agent against a deliberately overprivileged tool catalog and grade for least-privilege adherence
LLM07	System Prompt Leakage	Model reveals the system prompt or hidden instructions	30 standard extraction prompts, scored by leakage
LLM08	Vector and Embedding Weaknesses	Inversion or poisoning of the RAG index	Eval that the retriever returns expected docs for a known query set
LLM09	Misinformation	Confident incorrect output that the user trusts	Use the existing RAG faithfulness eval; track fabrication rate
LLM10	Unbounded Consumption	Resource exhaustion via prompts that explode token or tool budgets	Cap-and-budget probe: send a known-explosive prompt and verify the request fails closed

The table compresses the OWASP guidance. Read the upstream document for the full mitigations under each item ¹. The mapping below is opinionated and meant for an evals practitioner, not a security auditor.

What this list is good for

The list is most useful as a checklist for two conversations: with your security team, and with your customer's procurement team. Both want to know that you have thought about each category. A short page in your trust portal that says "Here is what we test for, mapped to OWASP LLM01 through LLM10, with the date of last verification" satisfies most of the second conversation without bespoke work each time.

For internal use, the list is a starting taxonomy for your red team. The five categories that tend to bite hardest in production are LLM01 (prompt injection, especially indirect), LLM05 (improper output handling, the source of most XSS bugs in chat surfaces), LLM06 (excessive agency, the new failure mode in agentic systems), LLM07 (system prompt leakage as a competitive-disclosure issue), and LLM09 (misinformation, where your judge has to grade for fabrication, not just helpfulness).

What this list is not

The OWASP Top 10 is not a benchmark. It does not give you a labeled corpus, a scoring rubric, or a comparable number across vendors. For each category you still need to instrument the probe, write the rubric, and decide your pass threshold. Public corpora like HarmBench cover overlapping ground (LLM01, parts of LLM09) but are not a substitute for category-by-category probing on your specific surfaces ².

Vendor red-team tools like Promptfoo's red-team module ship attack libraries organized by OWASP category, which can save you weeks of corpus construction; the trade is that you are then evaluating against an attack library someone else maintains ³. Microsoft's Responsible AI hub publishes mapping guidance for how Azure AI Studio's content safety policies relate to the OWASP categories ⁴; if you live in the Azure stack, the mapping is worth lifting.

How to use this page in a release process

The minimum useful artifact is a one-row-per-category table on your release checklist. Each row has the probe set, the pass threshold, the date last run, and the owner. Three categories per release have to be tested live, on a rotation, so that everything is exercised across a quarter. Two categories (LLM01 and LLM07) are tested on every release because the empirical regression rate on those is the highest.

CAUTION

The list updates every 18 to 24 months. Pin your testing to a specific version of the OWASP document, and re-baseline when a new version ships. The categories shift; treating the list as a moving target is a maintenance discipline, not a bug.

The next chapter, Designing a red-team program, covers the operating model that turns the table above into something that fires every release.

OWASP, "Top 10 for LLM Applications," current version. https://owasp.org/www-project-top-10-for-large-language-model-applications/ ↩ ↩²
Mazeika et al., "HarmBench" (2024). https://arxiv.org/abs/2402.04249 ↩
Promptfoo, "Red-team documentation." https://www.promptfoo.dev/docs/red-team/ ↩
Microsoft, "Responsible AI hub." https://www.microsoft.com/en-us/ai/responsible-ai ↩