Guest article
LLMs are the future of Threat Modeling, right?
Written by Felix Viktor Jedrzejewski, Ph.D. student at Blekinge Institute of Technology (BTH) and Co-Founder of Gaetir, Davide Fucci, Docent at BTH, and Oleksandr Adamov, Senior Lecturer at BTH
Is ChatGPT—or any other Large Language Model (LLM)—a suitable partner for building a threat model? Could we see bots that eventually replace threat modeling experts? Not quite. While LLMs look promising, we are still far from such a reality.
Threat modeling remains one of the most powerful ways to systematically identify which threats a system faces and how to mitigate them. But the process is resource-intensive, requiring security and domain expertise, multiple stakeholders and significant time investment, while often struggling to deliver measurable value.
Semi-automating selected threat modeling sub-steps with LLMs could lower the barrier to entry and spread the practice more widely. To achieve this, however, we need clear metrics to evaluate the quality of both human- and machine-generated threat models.
In this article, we share insights from an ongoing research project called ThreMoLIA building such a tool.
Moreover, we, the researchers involved in this project, ask for your input on defining quality metrics for the future.
Why Threat Modeling Needs Innovation
Our idea for an LLM-based threat modeling consists of the co-production between practitioners and an LLM-based chatbot that supports the practitioner across all threat modeling phases. In our position paper[1], we presented a workflow concept of an LLM-based threat modeling tool, which is shown below.
[1] https://arxiv.org/pdf/2504.18369
The RAG generates context input based on a variety of system-specific data. The tool generates a threat model and simultaneously calculates a health score based on a set of empirically evaluated metrics we are currently collecting in our study.
Such metrics address one of the elephants in the room. Applying LLMs, in any capacity, pose a privacy and security concern for the input data. Generally, LLMs tend to be very context sensitive caused by the large variety of training data commercial LLMs are trained on. This also leads to potential reliability issues posed by hallucinations that can lead to wrong security decisions badly impacting the security posture of a company.
All in all, our proposed workflow and the included tool will support non-experts to accelerate the threat modeling sessions, enabling an almost continuous threat modeling.
Research Project Insights
Measuring the Quality of Threat Models
The core challenge of ThreMoLIA is the lack of metrics to assess the quality of a given threat model. The goal of our ongoing study is to establish these metrics as a co-production between academia and industry.
We began by reviewing academic literature to extract metrics researchers have proposed for evaluating threat models. Next, we interviewed practitioners to test these metrics against their real-world experiences as we believe that such insights are essential to ensure our tool reflects industrial reality.
So far, our study points to three broad categories of quality metrics:
- Understandability – Can the entire threat model be read and comprehended with reasonable effort?
- Coverage – Does the threat model capture all relevant threats?
- Correctness – Are the identified threats valid and accurate?
We want your perspective: do these categories reflect your experience, or is something important missing?
Call to Action: Participate in the Survey
Your perspective will directly shape how future threat modeling tools are built and evaluated. We invite you to share your opinion in a short, anonymous, 15-minute survey:
Your input will ground academic findings in practice helping to define the how to evaluate the next generation of threat models created by hybrid human-AI teams.
Conclusion
Threat modeling is a cornerstone of secure software engineering, but it is often slow, costly, and inconsistent. By automating some of its most resource-intensive phases, we can make it faster, cheaper, and more accessible, paving the way for broader adoption across industry.
The future of threat modeling will not be defined only by new fancy AI-based tools but by we measure their effectiveness. Succeeding in building those metrics together will empower the next generation of threat modelers working efficiently with their AI counterparts.