Written by Jordan Hardy

Threat Modeling Insider Newsletter

51st Edition – February 2026

Welcome!

Welcome to this month’s edition of Threat Modeling Insider! In this edition, Dimitri Van Landuyt shares his view on usign LLM tools for complete threat models.

Next, on the Toreon Blog, Georges Bolssens shares his take on “Managing unknowns assumptions in Threat Modeling”. Very interesting read on hidden risks.

There’s plenty of other actionable insight ahead, so settle in and let’s get started!

Welcome!

Welcome to this month’s edition of Threat Modeling Insider! In this edition, Dimitri Van Landuyt shares his view on usign LLM tools for complete threat models.

Next, on the Toreon Blog, Georges Bolssens shares his take on “Managing unknowns assumptions in Threat Modeling”. Very interesting read on hidden risks.

There’s plenty of other actionable insight ahead, so settle in and let’s get started!

In this edition

Guest Article
Dimitri Van Landuyt shares his take on the usage of LLMs when generating complete threat models.

Toreon Blog
The Hidden Risk in Your Security Design: Managing Unknowns and Assumptions in Threat Modeling, by Georges Bolssens

Curated content
Securing Agentic AI: New Risks Require New Safeguards

Curated Content
Agent Skills for Continuous Threat Modeling

Tips & tricks
the C4 model for diagramming / aiming for a certain abstraction level

Training update
An update on our training sessions.

Guest Article

Should we generate complete threat models with LLM tools?

Written by Dimitri Van Landuyt

In a relatively short time frame, Large Language Model (LLM) technology has emerged, matured, and proven itself not only to be applicable but actually transformational for a wide range of practices. The core capability to ingest large volumes of information, to reason, and to generate structured or unstructured outputs on demand, at volume, and at velocity provides a unique and unprecedented value proposition. There is no doubt today: LLMs represent a paradigm shift.

On paper, this value proposition translates well to the specific context of threat modeling, which starts from collecting and reconciling different information elements about the system under design to construct a comprehensive system model. In the threat elicitation step, concrete potential threats –situations in which the security or privacy of the system may be harmed– are actively envisioned, articulated, and evaluated. This specific step, perhaps the actual heart of the overall threat modeling process, requires a unique blend of expertise, experience, system and domain knowledge and creativity. For these reasons, threat elicitation in practice remains a costly and time-consuming activity.

We have witnessed integrations of the LLM capabilities into diverse threat modeling methods and tools, in a variety of initiatives, some academic and some commercial in nature.

The growing landscape of LLM-based tools and methods

The well-known 4Q-model divides the overall threat modeling process into four steps, centered around four key questions:

What are we working on? (System description)
What can go wrong? (Threat elicitation, and threat prioritization)
What will we do about it? (Threat mitigation)
Did we do a good job?

The below table provides overview of the identied threat modeling tools that have currently integrated LLM capabilities in support of the overall threat modeling process. The table make distinction between which of these steps of the threat modeling process are targeted with LLM automation, and furthermore indicates whether the tool provides complete automation rather than LLM-based assistance to the human threat modeling.

Tool	System description	Threat elicitation	Threat prioritization	Threat mitigation
Auspex	Assistant	Full	Not supported	Assistant
IriusRisk Jeff	Assistant	Assistant	Assistant	Assistant
Threatmodeler WingMan	Assistant	Assistant	Assistant	Assistant
ThreatModelCompanion	Full	Full	Full	Full
SecureFlag ThreatCanvas	Not supported	Not supported	Not supported	Assistant
StrideGPT	Full*	Full	Not	Full
Fabric-Miessler	Not supported	Full	Not supported	Not supported
Dragon-GPT	Not supported	Full	Not supported	Not supported
PILLAR (Simple, Go, Pro)	Full*	Full	Full	Full
ThreatModelinLLM	Full	Assistant	Not supported	Assistant
AWS threat-designer	Asssistant	Assistant	Assistant	Not supported

As shown, many of these integration focus on supporting the threat elicitation activity, either by automating the process altogether or by providing the threat modeler with access to a supportive generative sparring partner. A number of these currently-available tools implement and provide complete automation of the threat elicitation step.

But, are these LLM-based threat elicitation tools legitimate replacements of human threat modelers in the performance of threat elicitation?

A scientic evaluation of the automated threat elicitation capability

Or, the same question phrased differently: do the LLM capabilities to incorporate, process, evaluate, summarize and reason lead to sufficient results to actually take over the job the human threat modeler?

To address this question, we have conducted a scientic evaluation study involving the publicly-available tools that automate the threat elicitation step: StrideGPT, PILLA-Go, and PILLA-Simple, FabricMiessler, Dragon-GPT, and ThreatModelingCompanion.

The study is based on two complex and contemporary application cases that involve the use of biometry in support of recognition and authentation, once in the context of a personal device (face-based phone unlock), and once in a shared enviroment (access control in a shart building such as an appartment complex or oce building). The data ow diagram (DFD) for the phone unlock system is shown below.

For each of these cases, we use the different LLM-based threat elicitation tools and collect their outputs. We furthermore explore the effect of using different LLMs, a mix of local and hosted models, and in total process 56 distinct threat models generated by automated tools.

We compare these outputs against a set of ground truth baselines, created by human threat modelers. In these baselines, we further distinguish between threats identied by novices and experts, and we further take into account the amount of agreement among the human threat modelers about the identied threats as a weight or penalty factor. This allows for in-depth evaluation of the generated results against human threat analysis.

We use NLP technology –sentence transformers– to perform threat mapping, i.e. to decide whether a generated threat indeed refers to the same issue as one of the threats documented by the human threat modelers. More specically, the attack-BET sentence transformer is used, which was trained specically on security terminology, and not surprisingly proved most capable for this purpose. We calculate semantic similarity, which expresses the extent to which two threat descriptions refer to the same core issue, regardless of phrasing. Based on that, we quantify performance, expressed in precision, recall, F1-score.

Results and findings

In the above graph, the red line represents the semantic similarity threshold (0.85) used as the criterion for considering a generated threat equivalent to one of the ground truth. As shown, the majority of threats generated per tool fall below this threshold line, and t within the 0.5-0.75 similarity range. Close, but not cigar. While the LLM tools generate threats that look similar to the relevant threats, they are not quite at the same level as the threats identied by human threat modelers (yet). Despite the toolgenerated threats being more similar than dissimilar to the relevant threats, overall performance is low.

For over 60% of the threats generated by the LLM-based threat elicitation tools, their closest baseline match was identied by novice threat modelers. Threat recall is very low for the threats uniquely elicited by the experts. LLM tools are not (yet) replacing threat modeling experts.

Overall, and in large part due to generating large volumes of less applicable threats, the absolute precision and recall of relevant threats remains lower than that performance of human threat modelers.

Tools with more more sophisticated prompting techniques score better on the more complex distributed building access control case, but such advantages are not seen in more the simpler application case. This suggests that the appropriate prompting technique may depend highly on the specics of the application case: there is no single-size-ts-all solution to instruct LLMs for threat modeling.

While there are notable dierences between the tools (which can mainly be considered to be a set of LLM prompt instructions, and conguration settings such as LLM temperature), a similar effect on performance results can be seen when changing the actual LLM models being used. The choice of underlying LLM model is of equal important to picking the right tool. In our study, the best results were consistently attained with Google’s gemini2.0-flash model.

Want to know more?

Complete details can be found in the scientic publication: A comparative benchmark study of LLM-based threat elicitation tools, Dimitri Van Landuyt, Majid Mollaeefar, Mario Raciti, Stef Verreydt, Abdulaziz Kalash, Andrea Bissoli, Davy Preuveneers, Giampaolo Bella, Silvio Ranise, Future Generation Computer Systems (special issue on Generative AI in Cybersecurity), Volume 177, 2026, ISSN 0167-739X, https://doi.org/10.1016/j.future.2025.108243.

(https://www.sciencedirect.com/science/article/pii/S0167739X25005370)

Additional materials such as case description, benchmark code, detailed results, and more can be found in the supporting materials package: https://zenodo.org/records/17305023

Learn to integrate AI into your threat modeling process.

Handpicked for you

Toreon Blog: The Hidden Risk in Your Security Design: Managing Unknowns and Assumptions in Threat Modeling

In the world of cybersecurity, what you know can protect you, but what you assume can destroy you. Threat modeling is a foundational pillar of secure design, yet even the most rigorous models are often built on a shaky foundation of “unknowns” and “unspoken assumptions”.

Whether you are “shifting left” leveraging modern DevOps processes or managing a legacy monolith, understanding how to document and validate these gaps is the difference between a resilient system and one waiting for a breach.

Most security teams follow the industry-standard “DICE framework” for threat modeling. However, this approach does (by far) not prevent assumptions from being made.

Curated Content

Securing Agentic AI: New Risks Require New Safeguards

Autonomous AI agents require a shift in threat modeling: treat them as “digital employees” requiring behavioral boundaries, not just traditional software.

Key Attack Vectors & Mitigations

Unbounded Autonomy: Mitigate rogue actions with human-in-the-loop (HITL) oversight, utilizing dynamic escalation or mandatory approvals for high-risk operations.
Compromised Agent Blast Radius: Contain lateral movement and unauthorized access using application sandboxing, Just-in-Time (JIT) provisioning, and strict least-privilege enforcement.
Training Data Poisoning: Implement rigorous data validation. The threshold for compromise is alarmingly low: injecting just 5 malicious texts into a dataset of millions can achieve a 90% manipulation success rate

Agent Skills for Continuous Threat Modeling

This MIT-licensed repository provides agent-agnostic tools to automate Continuous Threat Modeling within developer workflows, integrating directly with AI coding assistants like Claude Code and OpenAI Codex.

Available Core Skills:

pytm: Automatically generates comprehensive threat models, outputting both structural diagrams and actionable security findings.
ctm: Monitors code commits to evaluate and flag security-notable changes in real-time.
4qpytm: Facilitates an interactive, four-question threat modeling session guided by user input.

Handpicked for you

Toreon Blog: The Hidden Risk in Your Security Design: Managing Unknowns and Assumptions in Threat Modeling

In the world of cybersecurity, what you know can protect you, but what you assume can destroy you. Threat modeling is a foundational pillar of secure design, yet even the most rigorous models are often built on a shaky foundation of “unknowns” and “unspoken assumptions”.

Whether you are “shifting left” leveraging modern DevOps processes or managing a legacy monolith, understanding how to document and validate these gaps is the difference between a resilient system and one waiting for a breach.

Most security teams follow the industry-standard “DICE framework” for threat modeling. However, this approach does (by far) not prevent assumptions from being made.

Curated Content

STRIDE-GPT As an MCP Server: A Composable Tool That AI Agents Can Use Autonomously

STRIDE-GPT has been released as an MCP server, enabling AI agents to autonomously perform full threat modeling on a codebase or architecture while fitting seamlessly into agent-based workflows. It delivers STRIDE-based threat analysis, risk scoring, mitigations, and executive-ready reports, all composable with other MCP tools like GitHub and Terraform.

Key Takeaways are:

STRIDE-GPT turns threat modeling into an autonomous, composable AI workflow, integrating code, infrastructure, and security analysis in a single session
It provides comprehensive coverage, including all six STRIDE categories, DREAD risk scoring, attack trees, and OWASP LLM Top 10 (2025) AI/ML threats
The tool is free, open source, and flexible—usable interactively or fully autonomously—while supporting modern domains beyond web apps, such as cloud, APIs, IoT, and mobile

Global Risks Report 2026 - World Economic Forum

The Global Risks Report 2026 frames “Uncertainty” as the defining condition of a new Age of Competition, marked by weakening cooperation, rising multipolar rivalry, and eroding trust. The outlook is starkly pessimistic, with escalating geoeconomic conflict, accelerating AI risks, and long-term environmental threats reshaping how global and systemic risks must be understood.

Key Takeaways are:

Geoeconomic confrontation is now the top short-term global risk, pushing threat modeling to include infrastructure sabotage, supply-chain weaponization, and cyber-physical attacks on critical systems
AI has shifted from a tool-level risk to a systemic one, with concerns ranging from misinformation and deepfakes to adversarial data poisoning, automated escalation, and long-term governance failures
Cryptographic complacency, information warfare, and fragmented supply chains demand forward-looking models that account for quantum “harvest now, decrypt later” threats and the erosion of digital trust and strategic dependencies

TIPS & TRICKS

the C4 model for diagramming / aiming for a certain abstraction level

The “C4 model” offers a structured and hierarchical way to visualize software architecture, which greatly facilitates threat modeling by clarifying system context, allowing to “zoom in” to increased levels of detail when needed. This clear depiction helps identify interactions and allows for identification of potential attack surfaces, forming a solid foundation for pinpointing (and mitigating) security risks.

Inspired by the C4-model’s philosophy, a diagram-as-code framework known as “Structurizr” was developed to accompany it, featuring native export to other formats, avoiding any type of tool lock-in.

Book a seat in our upcoming trainings & events

Our trainings & events for 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, US Cohort

April 2026

Advanced Whiteboard Hacking – aka Hands-on Threat Modeling, NorthSec Training, Montreal

May 10-11 2026

3-Day Training: AI Whiteboard Hacking aka Hands-on Threat Modeling Training, in-person, OWASP Global AppSec EU, Vienna Austria

22-24 June 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, US Cohort

March 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, Europe Cohort

September 2026

Book a seat in our upcoming trainings & events

Our trainings & events for 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, US Cohort

April 2026

Advanced Whiteboard Hacking – aka Hands-on Threat Modeling, NorthSec Training, Montreal

May 10-11 2026

3-Day Training: AI Whiteboard Hacking aka Hands-on Threat Modeling Training, in-person, OWASP Global AppSec EU, Vienna Austria

22-24 June 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, US Cohort

June 2026

Threat Modeling Practitioner training, hybrid online, hosted by DPI, Europe Cohort

September 2026

Upcoming Events/Webinars

Supernova/Cybernova

You’ll be able to find us at both SuperNova on March 25th and March 26th, and CyberNova on March 24th. With Supernova being an internationally renowned event for tech and innovation, and this years first edition of Cybernova, focusing on all things related cybersecurity. Moreover we are also present at the Agoria boothspace.

Webinar Toreon x IriusRisk

This session is the first in our two-part series with IriusRisk and takes a maturity-based look at how threat modeling evolves as organizations scale from startup to enterprise.

You’ll learn how high-performing teams maintain consistent security standards while avoiding common pitfalls such as fragmented processes, inconsistent risk coverage, and duplicated or missed controls.

We’ll explore how the right mix of services, tooling, and best practices helps organizations build scalable, sustainable threat modeling programs, ensuring security keeps pace with business growth.