The Problem

Why current AI assurance approaches leave critical gaps

Every AI system and autonomous agent in production faces a growing landscape of risks — new vulnerabilities, shifting benchmarks, models that degrade in new contexts, and supply chains that propagate risk silently. The information to identify these threats exists. The problem is that nobody is systematically watching for it.

Three Approaches That Aren't Enough

Point-in-time testing validates what you already know to ask about. But emerging vulnerabilities, new adversarial techniques, and context shifts don't wait for your next scheduled review.

Runtime guardrails — prompt injection detectors, toxicity filters — rely on static pattern libraries. They catch yesterday's attacks, not tomorrow's.

Manual horizon scanning depends on individual analysts tracking publications and feeds. It doesn't scale, it has gaps, and there's no systematic correlation to your actual deployed systems.

Scenarios That Keep Happening

The redeployment blind spot: An LLM-powered analysis system moves from one operating environment to another. Standard tests pass. But nobody flagged that a known vulnerability was published three months earlier, training data had gaps in the new region, and a benchmark showed degraded performance on the relevant language pair.

The silent vulnerability: A critical CVE is published affecting a foundation model family used across 12 production systems. Who correlates that CVE to every downstream deployment? Who generates the test requirements? Today, nobody — systematically.

The agent chain reaction: An autonomous logistics agent orchestrates three models, accesses procurement APIs, and makes supply decisions. A jailbreak vulnerability in the base model allows privilege escalation through chained tool use. Each model was tested individually — but nobody assessed how their vulnerabilities compound in an agentic pipeline.

The inherited risk: Your team fine-tunes an open-weight model for a specialist task. Six months later, a training data contamination issue is discovered in the base model. There's no vendor to push a patch. No notification system. Every fine-tuned derivative is potentially affected — but who's tracking the lineage?

The ungoverned supplier: Every developer using Copilot, Cursor, or Claude Code is introducing an ungoverned supplier into their software supply chain. AI-generated code reproduces vulnerable patterns from training data, creates phantom dependencies invisible to SBOM scanners, and makes insecure cryptographic choices that traditional SAST tools weren't designed to catch. The code passes every test — because the tests don't know what to look for.

This isn't hypothetical. In March 2026, LiteLLM — a widely-used LLM proxy integrated into agent frameworks and orchestration tools — suffered a supply chain attack. Malicious PyPI packages stole environment variables, cloud credentials, and Kubernetes tokens. Organisations with unpinned transitive dependencies through agent frameworks were exposed without ever directly installing the compromised package. The AI supply chain is already under attack.

What Good Looks Like

With AICAP, the intelligence will exist before the problem materialises. When a vulnerability is published, it gets correlated to affected systems — and every agent that depends on them — within minutes. When a supply chain compromise is detected, every downstream deployment gets flagged. When a benchmark reveals degraded performance, test requirements are generated automatically. When a system is redeployed, the intelligence cycle evaluates the new context immediately.

You're also ahead of the regulatory curve. The EU AI Act, UK AI Safety Institute guidance, and the Cyber Security and Resilience Bill are all converging on continuous assurance. AICAP is designed to map evidence across frameworks from day one.

The difference isn't better testing — it's knowing what to test.