At WithSecure Consulting, we distinguish ourselves through our research-led approach. Our consultants dedicate part of their time to research – exploring the security of emerging technologies, such as GenAI. This approach allows us to stay ahead of new threats and provide real-world, actionable insights to our clients.
Our focus on GenAI security stems from the increasing adoption of large language models (LLMs) – like ChatGPT and Google Gemini – in enterprise environments. These systems, while powerful, introduce new and complex security risks, particularly around prompt injection attacks.
With our ongoing research on GenAI, we are looking to continuously deepen the understanding and raise awareness of these vulnerabilities to help organizations defend against potential exploits and ensure that they can safely leverage AI technologies.
Explore our GenAI security
research and experiments
Prompt Injection in JetBrains Rider AI Assistant
This advisory explore security issues that could arise when using GenAI assistants integrated within software development IDEs.
Specifically, we demonstrate how prompt injection injection can be leveraged by an attacker to exfiltrate confidential information from a development environment. If an untrusted code snippet containing a malicious payload is passed to the AI Assistant (for example, when asking for an explanation), the injected instructions will be executed by the underlying LLM.
We also provide recommendations to developers of such GenAI code assistants on how to mitigate these issues and reduce their impact.
Should you let ChatGPT control your browser?
This research investigates the security risks of granting LLMs control over web browsers.
We explore scenarios where attackers exploit prompt injection vulnerabilities to hijack browser agents, leading to sensitive data exfiltration or unauthorized actions, such as merging malicious code into repositories.
Our findings are relevant to developers and organizations that integrate LLMs into browser environments, highlighting the critical security measures needed to protect users and systems from such attacks.
Fine-tuning LLMs to resist indirect prompt injection attacks
We fine-tuned Llama3-8B to enhance its resilience against indirect prompt injection attacks, where malicious inputs can manipulate a language model’s behavior in tasks like summarizing emails or processing documents.
Building on insights from Microsoft and OpenAI, our approach focused on using specialized markers to separate trusted instructions from user data. This exploration allowed us to understand the effectiveness of these methods in real-world scenarios.
Organizations and developers can access our model and fine-tuning scripts on Hugging Face and Ollama to test and extend our findings.
Read on WithSecure LabsWhen your AI assistant has an evil twin (Google Gemini prompt injection)
In this piece of research, we demonstrate how Google Gemini Advanced can be manipulated through prompt injection attacks to perform social engineering. By sending a single email to the target user, attackers can trick LLMs into misleading users into revealing sensitive information.
This research is crucial for enterprises using LLMs in environments that handle confidential data, as it exposes how easily attackers can compromise the integrity of the AI’s output.
We also discuss the current limitations in defending against these attacks and emphasize the need for caution when using AI-powered assistants.
Read on WithSecure LabsSynthetic recollections (prompt injection in ReAct agents)
This research focuses on how prompt injection can be used to manipulate ReAct (Reason and Act) LLM agents by injecting false observations or commands. Such attacks can lead the LLM to make incorrect decisions or take unintended actions, which could benefit attackers.
This type of vulnerability is particularly concerning for organizations using LLM agents in decision-making processes.
We outline two main categories of attacks, discuss their potential impact on operations, and suggest strategies to mitigate the risks associated with these manipulations.
Read on WithSecure LabsDomain-specific prompt injection detection with BERT classifier
We explore how domain-specific prompt injection attacks can be detected using a fine-tuned BERT classifier. This research provides a practical method for identifying malicious prompts by training a small model on domain-specific data, allowing it to distinguish between legitimate and malicious inputs.
It’s aimed at security teams and developers working with AI models who need reliable mechanisms to detect prompt injections.
We present our findings on the effectiveness of this approach and how it can be implemented in various environments.
Read on WithSecure LabsJailbreak/prompt injection security canvas
This one-page canvas offers a structured framework of security controls designed to protect LLM applications from prompt injection and jailbreak attacks.
Based on our client engagements, the canvas outlines the essential steps needed to secure inputs and outputs in LLM workflows.
It’s a practical tool for developers and security teams looking to safeguard their AI systems by implementing tested controls that reduce the likelihood of prompt-based attacks and mitigate their impact.
Read on WithSecure LabsGenerative AI – An Attacker’s View
This blog delves into the use of GenAI (Generative Artificial Intelligence) by threat actors and what we can do to defend ourselves.
Topics covered include: social engineering, phishing, recon and malware generation.
Read on WithSecure LabsGenAI security challenges
To foster hands-on learning, we’ve released two public Capture the Flag (CTF) challenges focused on LLM security.
The challenges are designed for security researchers and practitioners who want practical experience with LLM security issues.
MyLLMBank
This challenge allows you to experiment with jailbreaks/prompt injection against LLM chat agents that use ReAct to call tools.
MyLLMDoc
This is an advanced challenge focusing on multi-chain prompt injection scenarios.