Published:
October 18, 2024January 27, 2025

At WithSecure Consulting, we distinguish ourselves through our research-led approach. Our consultants dedicate part of their time to research – exploring the security of emerging technologies, such as GenAI. This approach allows us to stay ahead of new threats and provide real-world, actionable insights to our clients.

Our focus on GenAI security stems from the increasing adoption of large language models (LLMs) – like ChatGPT and Google Gemini – in enterprise environments. These systems, while powerful, introduce new and complex security risks, particularly around prompt injection attacks.

With our ongoing research on GenAI, we are looking to continuously deepen the understanding and raise awareness of these vulnerabilities to help organizations defend against potential exploits and ensure that they can safely leverage AI technologies.

Explore our GenAI security
research and experiments

Spikee – Simple Prompt Injection Kit for Evaluation and Exploitation

Spikee is an open-source toolkit that we created to help security practitioners and developers test LLM applications for prompt injection attacks that can lead to exploitation – examples of which include data exfiltration, XSS (Cross-Site Scripting), and resource exhaustion. A key feature of spikee is that it makes it easy to create custom datasets tailored to specific use cases, rather than flooding applications with many generic jailbreak attacks.

Check out the code: https://github.com/WithSecureLabs/spikee
Tutorial: https://labs.withsecure.com/tools/spikee

Visit spikee’s website

Multi-Chain Prompt Injection Attacks

We introduce multi-chain prompt injection, an exploitation technique targeting applications that chain multiple LLM calls to process and refine tasks sequentially.

Current testing methods for jailbreak and prompt injection vulnerabilities fall short in multi-chain scenarios, where queries are rewritten, passed through plugins, and formatted (e.g., XML/JSON), obscuring attack success. Multi-chain prompt injection exploits interactions between chains, bypassing intermediate processing and propagating adversarial prompts to achieve malicious objectives.

Explore the technique: Try our sample app or the related Colab Notebook. We also have a public CTF challenge to experiment hands-on.

Read on WithSecure Labs

Prompt Injection in JetBrains Rider AI Assistant

This advisory explore security issues that could arise when using GenAI assistants integrated within software development IDEs.

Specifically, we demonstrate how prompt injection injection can be leveraged by an attacker to exfiltrate confidential information from a development environment. If an untrusted code snippet containing a malicious payload is passed to the AI Assistant (for example, when asking for an explanation), the injected instructions will be executed by the underlying LLM.

We also provide recommendations to developers of such GenAI code assistants on how to mitigate these issues and reduce their impact.

Read on WithSecure Labs

Should you let ChatGPT control your browser?

This research investigates the security risks of granting LLMs control over web browsers.

We explore scenarios where attackers exploit prompt injection vulnerabilities to hijack browser agents, leading to sensitive data exfiltration or unauthorized actions, such as merging malicious code into repositories.

Our findings are relevant to developers and organizations that integrate LLMs into browser environments, highlighting the critical security measures needed to protect users and systems from such attacks.

Read on WithSecure Labs

Fine-tuning LLMs to resist indirect prompt injection attacks

We fine-tuned Llama3-8B to enhance its resilience against indirect prompt injection attacks, where malicious inputs can manipulate a language model’s behavior in tasks like summarizing emails or processing documents.

Building on insights from Microsoft and OpenAI, our approach focused on using specialized markers to separate trusted instructions from user data. This exploration allowed us to understand the effectiveness of these methods in real-world scenarios.

Organizations and developers can access our model and fine-tuning scripts on Hugging Face and Ollama to test and extend our findings.

Read on WithSecure Labs

When your AI assistant has an evil twin (Google Gemini prompt injectio n)

In this piece of research, we demonstrate how Google Gemini Advanced can be manipulated through prompt injection attacks to perform social engineering. By sending a single email to the target user, attackers can trick LLMs into misleading users into revealing sensitive information.

This research is crucial for enterprises using LLMs in environments that handle confidential data, as it exposes how easily attackers can compromise the integrity of the AI’s output.

We also discuss the current limitations in defending against these attacks and emphasize the need for caution when using AI-powered assistants.

Read on WithSecure Labs

Synthetic recollections (prompt injection in ReAct agents)

This research focuses on how prompt injection can be used to manipulate ReAct (Reason and Act) LLM agents by injecting false observations or commands. Such attacks can lead the LLM to make incorrect decisions or take unintended actions, which could benefit attackers.

This type of vulnerability is particularly concerning for organizations using LLM agents in decision-making processes.

We outline two main categories of attacks, discuss their potential impact on operations, and suggest strategies to mitigate the risks associated with these manipulations.

Read on WithSecure Labs

Domain-specific prompt injection detection with BERT classifie r

We explore how domain-specific prompt injection attacks can be detected using a fine-tuned BERT classifier. This research provides a practical method for identifying malicious prompts by training a small model on domain-specific data, allowing it to distinguish between legitimate and malicious inputs.

It’s aimed at security teams and developers working with AI models who need reliable mechanisms to detect prompt injections.

We present our findings on the effectiveness of this approach and how it can be implemented in various environments.

Read on WithSecure Labs

Jailbreak/prompt injection security canvas

This one-page canvas offers a structured framework of security controls designed to protect LLM applications from prompt injection and jailbreak attacks.

Based on our client engagements, the canvas outlines the essential steps needed to secure inputs and outputs in LLM workflows.

It’s a practical tool for developers and security teams looking to safeguard their AI systems by implementing tested controls that reduce the likelihood of prompt-based attacks and mitigate their impact.

Read on WithSecure Labs