8 minute read

“We have a security program. It doesn’t mention AI. We have 47 AI systems in production.”

TL;DR

Most organizations have no AI security program, just scattered controls. Building one requires: AI asset inventory, AI-specific risk register, control framework mapping (NIST AI RMF + OWASP + ISO 42001), assessment cadence matching your risk, metrics proving improvement, and buy-in from engineering and leadership. The EU AI Act makes this mandatory for high-risk systems by August 2026. For the technical controls that populate the program, see Defense-in-depth for LLM applications.


Electronic components progressing from scattered chaos to organized grid on an anti-static mat

Step 1: What goes in the AI asset inventory?

You can’t secure what you can’t see. Most organizations can’t answer “how many AI systems do we have?”

The inventory captures every AI system, model, and data pipeline in the organization. For each system:

  • What model(s) does it use? Vendor, version, fine-tuning details
  • What data does it access? Customer data, internal documents, external APIs
  • What actions can it take? Read-only, write, execute, send communications
  • Who owns it? Engineering team, business owner, security contact
  • What risk tier? Based on data sensitivity and action scope
  • When was it last assessed? Date of most recent security evaluation
  • What’s the deployment context? Internal tool, customer-facing, agent with tool access

The inventory needs to be living, not a one-time spreadsheet. AI systems proliferate faster than traditional applications. Teams deploy LLM-powered features without security review because the feature is “just a chatbot.” That chatbot has access to the customer database. The inventory catches these before the red team does.


Step 2: What does the AI risk register look like?

Traditional risk registers don’t have categories for prompt injection, model memorization, or adversarial audio. The AI risk register adds these.

For each risk entry:

Field Description
Risk category OWASP LLM Top 10 or MITRE ATLAS category
Description AI-specific threat scenario
Affected systems Which inventory items are exposed
Likelihood Based on attack complexity and attacker motivation
Impact Confidentiality, integrity, availability, financial, reputational
Current controls What mitigations exist
Residual risk What remains after controls
Risk owner Who is accountable

Map each entry to at least one OWASP LLM Top 10 category and one MITRE ATLAS technique. This dual mapping ensures coverage across both application-level and adversarial ML threats.

AI-specific risk categories that traditional registers miss:

  • Prompt injection (direct and indirect) leading to data exfiltration or unauthorized actions
  • Training data memorization exposing PII or proprietary content
  • Model supply chain compromise through poisoned weights or malicious plugins
  • Excessive agency where agents take unauthorized actions
  • Adversarial attacks on ML pipeline components (ASR, classification, retrieval)
  • Shadow AI where unauthorized AI deployments bypass security controls

Step 3: Which frameworks do you map controls to?

Three frameworks cover the territory. Use all three in combination.

NIST AI RMF provides the governance structure. Four functions:

  • GOVERN: Establish AI risk culture, policies, accountability. Link AI risk to strategic goals. Cover the full AI lifecycle plus third-party risks.
  • MAP: Identify and assess AI-specific risks. Catalog threats, map attack surfaces, assess likelihood and impact.
  • MEASURE: Quantify risk through testing. Run adversarial evaluations, measure control effectiveness, benchmark against standards.
  • MANAGE: Implement and monitor controls. Deploy mitigations, operate monitoring, respond to incidents.

NIST’s Control Overlays for Securing AI Systems (COSAIS, in development) will provide granular control selection based on NIST SP 800-53, the U.S. federal security control standard. COSAIS bridges the gap between NIST AI RMF’s high-level guidance and specific implementable controls.

OWASP LLM Top 10 provides the application-security checklist. Ten categories of AI-specific vulnerabilities, each with testing guidance and remediation recommendations. Use this as the red team and assessment scope.

ISO 42001 provides the management system standard for AI governance. If your organization needs certification (increasingly required by enterprise customers and regulators), ISO 42001 defines the management system requirements. It complements NIST AI RMF with a certifiable framework.


Step 4: What assessment cadence matches your risk?

Three tiers of assessment, each serving a different purpose.

Continuous (automated, every deployment). Run Garak or Promptfoo as CI/CD gates. Test for regression against known vulnerability patterns. Alert when attack success rates exceed thresholds. This prevents shipping regressions and maintains the security baseline between deeper assessments.

Quarterly (human-led, full OWASP coverage). Human red team exercises covering all applicable OWASP LLM Top 10 categories. Multi-turn attack campaigns with PyRIT. Business logic testing specific to your applications. This discovers new vulnerabilities that automated scanning can’t find.

Annual (comprehensive, full program review). Supply chain assessment: model provenance, plugin security, dependency audits. Governance review: policy effectiveness, program maturity, gap analysis. Control framework mapping: coverage against NIST AI RMF, OWASP, and ISO 42001. Third-party assessment for high-risk systems.

Supplemental assessments:

  • Post-incident: focused testing of the exploited vulnerability class
  • Post-major-update: when model versions change, system architectures shift, or new tools are integrated
  • Pre-regulatory deadline: compliance-focused assessment against EU AI Act requirements

Step 5: What metrics matter?

Track trends, not snapshots. A single measurement tells you where you are. A trend tells you whether you’re improving.

Coverage percentage. What fraction of OWASP LLM Top 10 categories have been tested for each AI system? What fraction of AI systems in the inventory have current assessments? Target: 100% of high-risk systems assessed quarterly, 80%+ OWASP category coverage.

Mean time to detect (MTTD). How quickly do you identify AI-specific security incidents? Prompt injection attacks, jailbreak bypasses, data exfiltration through AI systems. Track this separately from traditional MTTD because the detection mechanisms are different.

Red team success rate trends. For each OWASP category, track the attack success rate over time across assessments. The trend should be downward. If jailbreak success rates are rising quarter over quarter, the remediation approach isn’t working.

Time-to-remediation. Track separately by remediation pathway: policy changes (days), model-level fixes (weeks), architecture changes (months). High TTR on architecture fixes is expected. High TTR on policy changes is a process failure.

Regression rate. Previously fixed vulnerabilities that reappear in subsequent assessments. High regression rates indicate fixes aren’t durable, typically because they were guardrail patches rather than architectural fixes.

Asset inventory completeness. What percentage of AI systems in the organization are captured in the inventory? Shadow AI is a growing problem. Track discovery of previously unknown AI deployments as a metric.


Step 6: How do you get buy-in?

The AI security program needs support from engineering teams (who implement controls) and leadership (who fund the program). Different audiences need different messages.

For engineering: Frame AI security as engineering quality, not bureaucratic overhead. “We test code for bugs. We test AI for vulnerabilities. Same principle.” Integrate into existing workflows (CI/CD gates, code review processes, deployment checklists) rather than creating parallel processes.

For leadership: Frame in business risk terms. “88% of organizations report AI security incidents. Our 47 AI systems access customer data. The EU AI Act fines are 7% of revenue.” Quantify the exposure. Show the assessment gap.

For both: Start small and demonstrate value. Pick the highest-risk AI system, assess it, present findings with business impact, remediate, and show the improvement. One successful assessment builds more credibility than any presentation about frameworks.


Key takeaways

  • Most organizations have no AI security program. They have scattered controls that don’t form a coherent defense.
  • Six components: AI asset inventory, risk register, control framework mapping, assessment cadence, metrics, and organizational buy-in
  • Three frameworks in combination: NIST AI RMF (governance), OWASP LLM Top 10 (application security), ISO 42001 (certification)
  • Assessment cadence: continuous automated (every deployment), quarterly human (full OWASP), annual comprehensive (governance + supply chain)
  • Metrics that matter: coverage percentage, MTTD, red team trends, time-to-remediation, regression rate
  • EU AI Act makes this mandatory for high-risk systems by August 2026. Start now.

FAQ

What goes in an AI asset inventory?

Every AI system with: model details, data access, action scope, owner, risk tier, and last assessment date. Most organizations can’t answer “how many AI systems do we have?” The inventory is the foundation everything else builds on.

How is an AI risk register different?

Adds AI-specific categories: prompt injection, memorization, model supply chain, excessive agency, adversarial attacks, shadow AI. Each entry maps to OWASP LLM Top 10 and MITRE ATLAS.

Which frameworks?

Three in combination. NIST AI RMF (governance structure), OWASP LLM Top 10 (application security checklist), ISO 42001 (certifiable management system). Map controls to all three.

What assessment cadence?

Continuous automated (every deployment), quarterly human-led (full OWASP categories), annual comprehensive (governance + supply chain). Plus post-incident and pre-regulatory assessments.

What metrics demonstrate effectiveness?

Coverage percentage, MTTD for AI incidents, red team success rate trends, time-to-remediation by category, regression rate. Track trends over time, not snapshots.

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch