
Veracode analyzed millions of lines of AI-generated code and found that nearly 36% contained exploitable vulnerabilities — many of them the same classes of flaws that SAST tools have flagged for a decade. JetBrains reports that developers are now using AI coding assistants for the majority of their day-to-day output. Do the math.
We're shipping more code, faster, with no meaningful improvement in the security of what gets shipped.
The standard response from security teams is to scan harder. More SAST. More SCA. More pipeline gates. And all of that will catch what it has always caught — known vulnerability patterns in known code constructs. What it won't catch is the thing AI-generated code fails at most reliably: security decisions that require understanding the context the code is running in.
That's the problem. And most teams have not looked it squarely in the face yet.
SAST tools operate on syntax. They look for patterns — a strcpy here, an unsanitized input there, a hardcoded credential in a config block. These are real problems. Worth catching. But they represent the floor of what can go wrong, not the ceiling.
AI-generated code introduces a different failure class. The code isn't syntactically wrong. It's contextually wrong.
Consider a developer who prompts an LLM to generate an API endpoint for updating user account settings. The model produces something that compiles cleanly, handles errors, and would pass most linting rules. But it misses an authorization check that your threat model flagged as critical — because the model had no visibility into your threat model. It doesn't know that this endpoint sits on a path accessible to unauthenticated lateral movement. It doesn't know the trust boundaries your architect drew six months ago. It knows the prompt it was given, and nothing else.
That gap — between what the model knows and what the system requires — is where the real risk lives.
Apiiro's research on AI-assisted development found that a significant percentage of high-severity findings in AI-generated code were business-logic or design flaws — the kind that scanners can't see because they require understanding the application's intended behavior, not just its syntax. SAST has no concept of "intended." It only knows "pattern."
My team and I have run over 1000 threat models across financial services, healthcare, SaaS, and infrastructure companies. In that work, one thing holds constant: the most dangerous vulnerabilities aren't the ones that look dangerous. They're the ones that look correct — because they are correct, in isolation. The flaw is in the relationship between components, not in any individual component.
AI-generated code is overwhelmingly correct in isolation. That's what makes it useful and what makes it risky.
When a developer asks a model to write an authentication handler, the model draws on an enormous corpus of authentication code. It knows the patterns. It will produce something that handles the happy path, manages sessions, and probably even includes error handling. What it won't do — can't do — is reason about whether this handler sits behind a WAF, how it integrates with your identity provider's token validation logic, or whether the session lifetime it chose matches the regulatory requirements your legal team surfaced last quarter.
Context-free code generation produces code that is correct relative to a generic best practice, and potentially wrong relative to your specific threat model.
The NIST Secure Software Development Framework has always placed threat modeling upstream — at the design phase, before a line of code is written — precisely because fixing context failures in production is orders of magnitude more expensive than not introducing them. NIST and IBM's Systems Sciences Institute established the 30x cost multiplier for production defects versus design-stage fixes years ago. With AI multiplying code production velocity, that multiplier is compounding. You're not just fixing more bugs faster; you're embedding context failures at a rate no downstream scanner will keep up with.
Context-aware security for AI-generated code is not a product category. It's a practice — and right now, it's an immature one.
The core idea is straightforward: if AI generates code from a prompt, and the prompt carries no security context, the generated code will have no security context. The fix is to change what goes into the prompt, and to enforce that discipline systematically.
In practice, this means three things.
First, threat context at the prompt layer. Before a developer invokes an LLM to generate a security-sensitive component, the relevant threat model outputs should be available to that prompt — either injected automatically by a developer toolchain integration or explicitly required as part of the workflow. This is not about pasting your entire architecture diagram into a chat window. It's about distilling the specific threat context relevant to the component being generated: the trust boundaries it crosses, the assets it touches, the attacker capabilities it needs to resist.
My team has been building harnesses that do exactly this — taking structured threat model outputs (we use PWNISMS: Product, Workload, Network, IAM, Secrets, Monitoring, Supply Chain) and feeding the relevant slice of that model into the code generation context. The difference in output quality is not marginal. Structured threat context surfaces authorization requirements, input validation scope, and secrets handling constraints the model would otherwise have no reason to include.
Second, security-gated task planning. Before a developer breaks ground on a component, the planning artifact — the ticket, the technical design, the story — should carry explicit security acceptance criteria derived from the threat model. Not a generic checklist. The specific properties this component must satisfy: the authorization rules it must enforce, the trust boundaries it must respect, the failure modes it must handle gracefully.
This is where SecurityReviewAI's approach pays off. When security context is embedded in the plan — before the prompt is written, before the model generates a single line — the AI has something to build toward. It's not generating against a vacuum. It's generating against documented, threat-model-derived requirements. Reviewers aren't being asked to reconstruct context that should have been there from the start; they're verifying against criteria that were defined before the work began. The security decision moves from "catch it if we can" to "specified before it was built."
Third, specification-level security requirements. The furthest upstream fix is to ensure that the specifications developers use to prompt AI tools carry security requirements embedded in them — not as an appendix, but as first-class criteria. CISA's Secure by Design pledge calls on software manufacturers to take ownership of customer security outcomes at the product level. The equivalent principle for AI-assisted development is: own the security outcomes of AI-generated code at the specification level, not the scanning level.
If your user stories and technical specs include the specific security properties a component must satisfy — authentication requirements, authorization rules, data handling constraints, failure mode behavior — those properties flow into the prompt. The model generates code that attempts to satisfy them. You can then verify against the spec rather than against a generic vulnerability pattern library.
A financial services team I worked with last year had a practical implementation of this, built incrementally. Their threat modeling process produced structured outputs — not prose documents, but machine-readable artifacts: threat scenarios keyed to system components, with mitigations and residual risks annotated. When developers opened a ticket to build a new service, the ticket template required them to link the relevant threat model artifact before they could start.
When they used AI code generation, the linked artifact was part of the prompt context. When the generated code came into review, the reviewer had the artifact open alongside the diff.
Their escape rate for authorization-related findings dropped. More importantly, the type of finding that escaped changed — toward lower-severity issues that scanners caught, away from design-level flaws that scanners missed. That's the shift context-aware security produces. You move from catching vulnerabilities after the fact to not generating them in the first place.
AI is not going to generate secure code by default. The models are trained on the internet's code, and the internet's code has a vulnerability rate that no fine-tuning pass is going to fully correct. The model doesn't know your threat model. It doesn't know your regulatory environment. It doesn't know the four threat scenarios your architect documented last sprint.
You have to tell it. Systematically, not ad hoc.
The teams that figure this out first — that invest in structured threat modeling outputs, in tooling that feeds those outputs into the generation and review workflow, in specifications that carry security requirements as first-class content — those teams will ship secure AI-generated code. Everyone else will add scanners, find the same OWASP Top 10 patterns they've always found, and miss the authorization flaw that their AI helpfully generated last Tuesday because nobody told it about the trust boundary.
Context-aware security for AI-generated code isn't a feature you buy. It's a practice you build, starting upstream, before the first prompt is written.
That's where security has always belonged.
Context-aware security means giving AI coding tools the specific threat context for the system they're contributing to — trust boundaries, authorization requirements, attacker capabilities — before they generate a line of code. Without that context, AI produces code that is syntactically correct but potentially wrong for your specific threat environment. Scanners can't close that gap. Context, fed upstream, can.
SAST operates on syntax — it matches known vulnerability patterns in code constructs. AI-generated code frequently fails not because it looks broken, but because it's missing security logic that only makes sense in the context of your system's threat model: an authorization check your architecture requires, a session lifetime your regulatory environment mandates, a trust boundary your architect defined. SAST has no concept of "intended behavior." It can only see "pattern."
PWNISMS is a threat generation framework covering seven system layers: Product, Workload, Network, IAM, Secrets, Monitoring, and Supply Chain. When feeding threat context into an AI code generation prompt, PWNISMS provides a structured way to distill the relevant slice of your threat model — so the model isn't generating against a vacuum, but against specific, documented security requirements for the component it's building.
It means embedding security acceptance criteria — derived from your threat model — into the ticket, story, or technical design before development starts. Before a developer writes a single prompt, the planning artifact should specify the authorization rules the component must enforce, the trust boundaries it must respect, and the failure modes it must handle. That context flows into the prompt. The AI generates toward requirements, not around them.
It's the application of Secure by Design principles to AI-assisted development specifically. CISA's Secure by Design pledge asks manufacturers to own security outcomes at the product level. Context-aware security for AI-generated code operationalizes that at the workflow level: security requirements live in the specification, not the scanner. The design decision happens before the code exists, which is where NIST has always said it should happen.