NIST AI RMF mapped to eval activities

Govern, Map, Measure, Manage as a cross-walk to the eval artifacts your team already produces or needs to build.

The NIST AI Risk Management Framework is the most widely adopted voluntary AI governance reference in the United States and a common pointer in vendor agreements outside it. It is not a checklist; it is a structured set of outcomes organized into four functions (Govern, Map, Measure, Manage) with seven cross-cutting characteristics for trustworthy AI ¹. The reason this page exists is that the framework's vocabulary is the lingua franca for AI governance conversations in 2026, and the most useful artifact your team can produce is a cross-walk from RMF outcomes to your existing eval activities.

The one-line version of that cross-walk: each function reduces to a small bundle of eval artifacts.

flowchart LR
    G[Govern] --> GA["Eval ownership + release-gate policy"]
    M[Map] --> MA["Use-case inventory + risk register"]
    ME[Measure] --> MEA["Standing eval suite + safety benchmarks"]
    MG[Manage] --> MGA["Incident response + post-market monitoring"]

The four functions, with eval cross-walk

The mapping below uses NIST's RMF 1.0 function names. The "Eval activity" column maps each function to the work your team is likely already doing or planning, with the page on this site that covers it where applicable.

Function	What NIST says	Eval activity
Govern	Policies, accountability, oversight	Anthropic RSP or OpenAI Preparedness as a reference; written eval ownership; release-gate policy
Map	Context, use cases, stakeholders	Persona and use-case inventory; system-level risk identification; risk register entries
Measure	Quantitative and qualitative evaluation	Standing eval suite (offline and online), error analysis, LLM-as-Judge calibration, safety benchmarks (see HarmBench, AILuminate)
Manage	Risk prioritization, response, communication	Incident response process; post-market monitoring; board readouts; model and system cards

The function names are deliberately broad. The cross-walk is what makes them actionable.

The seven trustworthiness characteristics

NIST also defines seven characteristics that any trustworthy AI system should exhibit. Each maps to specific eval activities; the cross-walk below is the artifact most audits accept as evidence.

Characteristic	Eval activity that demonstrates it
Valid and reliable	Accuracy benchmarks against held-out test set, confidence intervals on metrics, statistical rigor
Safe	Adversarial testing, red-team program, public safety benchmark grades (HarmBench, AILuminate)
Secure and resilient	Input-filter coverage, output-filter coverage, robustness under prompt perturbation
Accountable and transparent	Ownership documented per system, model card published, change log of model and prompt updates
Explainable and interpretable	Trace logging, decision rationale capture, judge critiques alongside scores
Privacy-enhanced	PII-canary probes in eval set, differential-privacy or redaction in prompts where applicable, data-retention policy documented
Fair (with harmful bias managed)	Demographic-slice testing on benchmarks where labels exist; disparity reporting per slice

A working audit binder maps each row to specific eval artifacts with dates and owners. The artifact is one page per row; the binder is fewer than twenty pages for a serious organization. If yours is approaching a hundred, you are producing process documentation, not evidence.

What the RMF buys you

Three concrete things, regardless of jurisdiction.

First, a vocabulary. When a customer's procurement team or a regulator asks how you manage AI risk, "we follow NIST AI RMF" is an answer that closes a meaningful percentage of conversations on its own. Microsoft, Google, IBM, and most major model providers publish RMF alignment claims in their trust portals ². Saying yes is table stakes.

Second, a checklist. The cross-walk above is the smallest useful version of that checklist. The longer version, as a working document, becomes the index for your audit binder.

Third, a forcing function. Mapping your existing eval activities to RMF outcomes surfaces gaps. The most common gap is on the Govern side: teams have measurement and management but no documented policy for who decides what to ship and why. The second most common is on the Manage side: incidents happen and are fixed but no post-mortem feeds back into the eval suite. Both are cheap to close once named.

What the RMF does not buy you

The RMF is voluntary and outcome-based, not prescriptive. It does not name specific benchmarks. It does not set numerical thresholds. It does not certify your program. Several of these are now being filled in by the EU AI Act (which does point at testing protocols for high-risk systems), by sector-specific regulators (financial services, healthcare), and by the GPAI Code of Practice ³. Adopt the RMF as the structuring document; layer the specifics on top.

Mapping to EU AI Act obligations

If your eval program is RMF-aligned, the EU AI Act mapping is short. The Measure and Manage functions correspond directly to the Act's accuracy testing (Art. 15) and post-market monitoring (Art. 72) obligations ³. The Govern function corresponds to the quality management system and risk-management process (Arts. 9, 17). The Map function maps to the intended-purpose specification and conformity assessment scoping.

The shortest defensible audit answer is a one-page exhibit that lists each Act obligation, the RMF function it maps to, and the specific eval artifact that demonstrates it. That exhibit, with citations to the underlying documents (your risk register row, your model card section, your post-market report), is what auditors look for.

A 30-day starter

Week	Action
1	Pick one product. List the seven RMF characteristics on a single page. Note one eval activity per characteristic.
2	Fill the gaps. The most common holes are documented ownership (Govern) and demographic-slice testing (Fair).
3	Stand up the model card. Anthropic's framing in their "Challenges in Evaluating AI Systems" piece is a useful reference for what to include and what to leave out ⁴.
4	Wire the audit binder index. One page per characteristic; one row per Act obligation; the cross-walk above as the master table.

The cross-walk produced this month is the same document you will hand to a customer's procurement team, an internal compliance team, a regulator, or an acquirer. Build it once.

The next chapters cover the two major vendor scaling policies that practitioners reference when designing internal versions: the Anthropic Responsible Scaling Policy and the OpenAI Preparedness Framework.

NIST, "AI Risk Management Framework." https://www.nist.gov/itl/ai-risk-management-framework ↩
Microsoft, "Responsible AI hub." https://www.microsoft.com/en-us/ai/responsible-ai ↩
European Commission, "Regulatory framework on AI." https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai ↩ ↩²
Anthropic, "Challenges in Evaluating AI Systems." https://www.anthropic.com/news/evaluating-ai-systems ↩