Anthropic Responsible Scaling Policy

The capability-threshold logic, the AI Safety Level scheme, and what an internal team can copy from the public document.

Anthropic's Responsible Scaling Policy (RSP) is one of the two public scaling policies practitioners point at when designing internal AI governance. The Policy describes capability thresholds, defines AI Safety Levels (ASL), and commits the company to specific evaluation and deployment requirements as model capabilities cross those thresholds ¹. The document is worth reading once end to end if you have any role in AI governance; this page is the operating summary plus the parts an enterprise team can reasonably copy.

The core structure

The RSP defines a sequence of AI Safety Levels (ASL-1, ASL-2, ASL-3, ASL-4) corresponding to escalating capabilities and escalating safeguards. Levels are not assigned to models by year; they are assigned by capability evaluations that test for specific risks. As long as a model's capability evaluations stay below the next threshold, the safeguards required for that next level do not have to be in place. Crossing a threshold triggers the next set of safeguards as a precondition for deployment.

The structure has three operational properties worth lifting.

First, the trigger is capability, not size or compute. A small efficient model that crossed a biological-uplift threshold would trigger the same safeguards as a frontier model crossing the same threshold. This is the right shape for a risk-based regime; size proxies poorly for capability in 2026.

Second, the safeguards are pre-deployment, not post-incident. The Policy commits to having the next-level safeguards in place before the model is deployed to users at the next capability level. The argument is that catching a failure after deployment is the failure; the work is to make deployment conditional on prior safeguards.

Third, the Policy includes a if-then commitment to pause or restrict deployment if safeguards cannot be implemented in time. This is the part that distinguishes a scaling policy from a marketing document.

What the levels cover

The published ASL tiers correspond, roughly, to:

Level	Capability bar	Safeguard family
ASL-1	Systems with no meaningful catastrophic risk	Standard responsible deployment
ASL-2	Systems showing early signs of dangerous capabilities; current frontier as of mid-2024	Use policies, model cards, structured red-teaming, secure model weights
ASL-3	Substantial increase in misuse risk; biological, chemical, cyber, or autonomous-action uplift	Hardened security for weights, enhanced misuse mitigations, deployment restrictions, incident response
ASL-4	Qualitatively novel risk profile; not yet defined in detail	To be specified before threshold is crossed

The exact threshold definitions and required safeguards are in the published Policy and are updated; treat the document, not this table, as authoritative ¹.

What an internal team can lift

Most companies do not need an RSP at this scale. Most companies do need a smaller version of the structure: a written commitment that specific capabilities, once your model or product reaches them, trigger specific additional safeguards.

The lift list:

The if-then structure. Write commitments as "if our model passes the following eval, then we will have implemented the following safeguards before deployment." The form forces the safeguards to be specific and the triggers to be testable.

The capability evaluation discipline. The Policy is anchored in evaluations. Yours should be too. The eval is the trigger; the eval is what an external auditor will look at. Anthropic's "Challenges in Evaluating AI Systems" piece is honest about how hard the eval work is to do right, which is itself a useful reference for setting expectations internally ².

The pause-or-restrict commitment. The Policy says, in essence, that the company will not ship if safeguards cannot be in place. This is the part that customers and regulators want to see in writing. The internal version is a release-gate policy that lists the conditions under which a release is held back, and who has the authority to make that call.

The public document. Publishing the Policy commits the organization to it. Most internal teams will not publish their own version, but an internal version that the engineering and product leadership have signed is the operational equivalent.

The differences from a risk register

A risk register (see the next chapter) is a list of risks with current mitigations and residual risk. The RSP is a forward-looking commitment about what will happen as capabilities increase. The two complement each other: the register documents current state; the Policy documents future thresholds. A working governance program has both.

Mapping to RMF and the EU AI Act

The RSP-style structure maps to the RMF Govern function. The capability evaluations are Measure activities; the safeguards are Manage activities. For EU AI Act compliance of a GPAI systemic-risk model, an internal RSP-equivalent is a useful (not sufficient) input to the systemic-risk assessment and mitigation obligation ³.

What to do this quarter

Read the published RSP end to end. It is the shortest path to seeing how the if-then structure works.
Identify the two or three capabilities of your product that would, if substantially increased, change your safeguard requirements. Write an if-then for each.
Decide who has the authority to invoke the pause or restrict clause. Get that authority in writing.

The point is not to build a scaling policy at the frontier-model scale. The point is to internalize the structure: capability triggers, written commitments, deployment gates. The structure works at smaller scales too, and most enterprise teams need a version of it before any external audit conversation will be useful.

The next chapter, OpenAI Preparedness, covers the parallel public document from the other major frontier lab and where it differs from the Anthropic Policy.

Anthropic, "Responsible Scaling Policy." https://www.anthropic.com/responsible-scaling-policy ↩ ↩²
Anthropic, "Challenges in Evaluating AI Systems." https://www.anthropic.com/news/evaluating-ai-systems ↩
NIST, "AI Risk Management Framework." https://www.nist.gov/itl/ai-risk-management-framework ↩