AI Security
Threat Modeling

How to Stop Your Shiny New LLM from Becoming a Dumpster Fire

PUBLISHED:
September 17, 2025
BY:
Debarshi Das

How to Stop Your Shiny New LLM from Becoming a Dumpster Fire

Your new AI is a toddler with a supercomputer. It's brilliant, has no common sense, and is dangerously easy to trick.

Remember Microsoft's Tay? In 2016, it went from a friendly chatbot to a racist PR disaster in less than 24 hours, all because Twitter users manipulated it. That wasn't a bug in the code, but a failure to imagine what could go wrong with a system designed to learn from public input.

This is your guide to threat modeling for AI. We'll use a classic security framework to explore the unique ways AI can be abused so your shiny new tool doesn't become the next digital dumpster fire.

Table of Contents

  1. Threat Modeling 101
  2. STRIDE for Skynet: A Threat-by-Threat Breakdown for AI
  3. Okay, I'm terrified. What do I actually do?
  4. Don't be the reason we have AI safety PSAs

Threat Modeling 101

Threat modeling is just structured paranoia but the type that actually lets you sleep at night. It boils down to four questions:

  1. What are we working on? (The cool new AI feature).
  2. What can go wrong? (The fun part).
  3. What are we going to do about it? (The work part).
  4. Did we do a good enough job? (The part you do before it's all over the news).

We’ll use the STRIDE framework to figure out what can go wrong. It’s a simple mnemonic for categorizing threats and works surprisingly well for our new robot friends.

STRIDE for Skynet: A Threat-by-Threat Breakdown for AI

S is for Spoofing (Your AI's senses)

In AI, spoofing is all about faking reality. An attacker tricks the model's senses into seeing something that isn't there. This is called an Adversarial Evasion Attack.

Neural networks can be weirdly linear. A bunch of tin and imperceptible changes to an image can make a model see something completely different. This is how a picture of a panda, with a bit of specially crafted noise, gets classified as a "gibbon" with 99.3% confidence.

This has real-world consequences. Researchers have put stickers on a stop sign and fooled an autonomous vehicle's vision system into thinking it was a "Speed Limit 45 MPH" sign.

T is for Tampering (With your AI's brain)

Tampering with AI means corrupting its education. This is Data Poisoning, and it happens during the model's training process.

An attacker injects malicious examples into the training data. Since models learn from this data, they learn the attacker's bad patterns. This is a huge risk when you train models on massive datasets scraped from the web. An attacker only needs to compromise a tiny fraction of the training data to cause major problems.

This is a supply chain attack. The vulnerability isn't in your running application, but in the data you used to build it. The attacker could be anyone who can edit a Wikipedia page or a GitHub repo that you later scrape for training.

R is for Repudiation (And the black box problem)

Repudiation is when a user can deny doing something because the system can't prove it. With AI, this is tied to explainability.

Many models are black boxes, meaning their internal logic is too complex for humans to understand. So when your AI hiring tool rejects a candidate, can you prove why? If you can't, you can't refute claims of bias. Amazon learned this the hard way when their experimental hiring tool penalized resumes containing the word "women's" because it learned from biased historical data. If a model can't explain itself, your organization can't account for its actions.

I is for Information Disclosure (When your AI can't keep a secret)

AI models can accidentally memorize and leak parts of their training data. Attackers have two main ways to exploit this.

  1. Model Inversion: An attacker can repeatedly query a model to reverse-engineer its private training data. Think of it like a game of 20 Questions to reconstruct a person's face from a facial recognition model.
  2. Membership Inference: This is more subtle. The attacker doesn't steal the data, but confirms if a specific person's data was in the training set. For example, they could confirm a specific person was in a dataset from a cancer hospital, revealing a medical condition without ever seeing the record itself.

D is for Denial of Service (Death by a thousand cuts)

A Denial of Service attack on an AI doesn't just mean flooding a server. It can mean rendering the model itself useless.

An attacker can do this by feeding the model a stream of adversarial inputs to tank its confidence, or by spamming it with junk data that degrades its performance. Imagine a spam filter that's been fed so many cleverly disguised bad emails that it starts letting everything through. The service is technically online, but it’s no longer doing its job.

E is for Elevation of Privilege (Through the supply chain)

Elevation of Privilege (EoP) happens when an attacker gets permissions they shouldn't have. For AI, the machine learning supply chain is a massive vector for this.

Most companies don't train models from scratch. They use a pre-trained base model from a public "Model Zoo" and fine-tune it. If an attacker can poison a popular base model, they can compromise every single downstream application that uses it. 

A simple pip install on the wrong package can lead to a massive supply chain compromise.

Okay, I'm terrified. What do I actually do?

Don't panic. Securing AI requires a change in mindset, not magic.

  • Treat your data like nuclear waste: Assume all data is potentially compromised. Know where your data comes from, validate it, and continuously monitor it for anything that looks suspicious. Sanitize all user-supplied inputs before they ever touch your training set.
  • Fight fire with fire (Adversarial Training): The best defense against adversarial attacks is to expose your model to them during training. Generate your own adversarial examples and teach the model to classify them correctly. It’s not a perfect fix, but it makes an attacker's job much harder.
  • Build a control room (Continuous Monitoring): Monitor your model in production. Track its prediction confidence, watch for drift in its outputs, and analyze incoming queries. This telemetry is the only way to prove your model is behaving as expected.
  • Don't outsource your brain (Be careful with third parties): If a model is business-critical, train it in-house. If you must use a third-party service or an open-source model, vet it obsessively.
  • Actually do threat modeling: Make these STRIDE discussions a required step in your development process. Have the "What can go wrong?" conversation before you write the first line of code, not after you're trending on Twitter for the wrong reasons.

Don't be the reason we have AI safety PSAs

Look, you're not going to build a perfectly secure AI. The National Institute of Standards and Technology (NIST) will tell you there's no silver bullet for many of these attacks.

The real goal is risk management. It’s about understanding the weird new ways your model can be broken so you can make smart, informed decisions. These threats are not theoretical.

So before you ship that next AI feature, ask one simple question: What can go wrong?

Because if you don't ask it now, your users will give you the answer later. And you probably won't like it.

SecurityReview.ai makes this process practical at scale. The platform uses AI to automate threat modeling to quickly analyze system design, code, and documentation to identify critical security risks. It’s built for organizations that want to replace manual reviews and maintain real-time visibility into their ever-changing risk landscape.

FAQ

What is AI threat modeling and why is it important?

AI threat modeling is the process of systematically identifying and assessing potential vulnerabilities, risks, and attack scenarios specific to artificial intelligence and machine learning systems. This helps organizations anticipate what could go wrong before deploying AI, allowing them to implement controls and reduce the risk of security incidents or reputation damage.

What are the biggest threats to large language models (LLMs) like ChatGPT and other enterprise AI?

Major threats include spoofing (adversarial input and evasion attacks), tampering (data poisoning and supply chain risk), repudiation (lack of explainability or audit trails), information disclosure (data leakage through model inversion or membership inference), denial of service (overloading AI with malicious inputs), and elevation of privilege (compromised third-party or open-source components).

How does the STRIDE framework apply to AI systems?

The STRIDE framework breaks down threats into Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. Each of these categories takes on unique forms when applied to AI, such as adversarial evasion (Spoofing), data poisoning (Tampering), and model inversion (Information Disclosure).

What is data poisoning in AI and how can it be prevented?

Data poisoning refers to the manipulation or contamination of training data to influence a model’s behavior and inject vulnerabilities. Prevention measures include validating and sanitizing data sources, continuous monitoring, adversarial training, and treating all external data with high scrutiny.

Can attackers reverse-engineer or leak the data used to train an AI model?

Yes, through techniques such as model inversion and membership inference, attackers can extract or infer sensitive details about the training data, potentially exposing personal or confidential information.

What is adversarial training for AI models?

Adversarial training is a technique where models are deliberately exposed to malicious or tricky inputs during training. This helps improve resilience and teaches the AI to better recognize and handle adversarial attacks in real-world scenarios.

Why is continuous monitoring essential for AI security?

Continuous monitoring allows teams to detect unexpected behavior, track prediction confidence, watch for drift in outputs, and identify suspicious query patterns in real time. This is crucial for early threat detection and maintaining system integrity after deployment.

What are the risks of using third-party or open-source AI models?

Third-party or open-source models may contain hidden vulnerabilities, supply chain compromises, or intentional backdoors left by attackers. Vetting, reviewing, and ideally training core models in-house are important best practices for enterprise AI security.

Is perfect security for AI models possible?

No, perfect security is not feasible. The goal is risk management through understanding unique vulnerabilities, regular threat modeling, and building layers of defensive strategies to reduce exposure and minimize impact.

How can organizations get started with threat modeling for AI?

Organizations should adopt threat modeling as a regular step in their AI development lifecycle. This involves using frameworks like STRIDE to periodically assess what could go wrong, implementing technical controls, and performing ongoing reviews and audits before product release.

View all Blogs

Debarshi Das

Re-searcher. Sometimes I write code, other times tragedy.
X
X