Generative AI security: Findings from our research

Share

At WithSecure Consulting, we distinguish ourselves through our research-led approach. Our consultants dedicate part of their time to research – exploring the security of emerging technologies, such as GenAI. This approach allows us to stay ahead of new threats and provide real-world, actionable insights to our clients.  

Our focus on GenAI security stems from the increasing adoption of large language models (LLMs) – like ChatGPT and Google Gemini – in enterprise environments. These systems, while powerful, introduce new and complex security risks, particularly around prompt injection attacks. 

With our ongoing research on GenAI, we are looking to continuously deepen the understanding and raise awareness of these vulnerabilities to help organizations defend against potential exploits and ensure that they can safely leverage AI technologies.

Explore our GenAI security
research and experiments

Multi-Chain Prompt Injection Attacks

We introduce multi-chain prompt injection, an exploitation technique targeting applications that chain multiple LLM calls to process and refine tasks sequentially.

Current testing methods for jailbreak and prompt injection vulnerabilities fall short in multi-chain scenarios, where queries are rewritten, passed through plugins, and formatted (e.g., XML/JSON), obscuring attack success. Multi-chain prompt injection exploits interactions between chains, bypassing intermediate processing and propagating adversarial prompts to achieve malicious objectives.

Explore the technique: Try our sample app or the related Colab Notebook. We also have a public CTF challenge to experiment hands-on.

Read on WithSecure Labs
Prompt Injection in JetBrains Rider AI Assistant

This advisory explore security issues that could arise when using GenAI assistants integrated within software development IDEs.

Specifically, we demonstrate how prompt injection injection can be leveraged by an attacker to exfiltrate confidential information from a development environment. If an untrusted code snippet containing a malicious payload is passed to the AI Assistant (for example, when asking for an explanation), the injected instructions will be executed by the underlying LLM.  

We also provide recommendations to developers of such GenAI code assistants on how to mitigate these issues and reduce their impact.

Read on WithSecure Labs
Should you let ChatGPT control your browser?

This research investigates the security risks of granting LLMs control over web browsers.  

We explore scenarios where attackers exploit prompt injection vulnerabilities to hijack browser agents, leading to sensitive data exfiltration or unauthorized actions, such as merging malicious code into repositories.  

Our findings are relevant to developers and organizations that integrate LLMs into browser environments, highlighting the critical security measures needed to protect users and systems from such attacks.

Read on WithSecure Labs
Fine-tuning LLMs to resist indirect prompt injection attacks

We fine-tuned Llama3-8B to enhance its resilience against indirect prompt injection attacks, where malicious inputs can manipulate a language model’s behavior in tasks like summarizing emails or processing documents.

Building on insights from Microsoft and OpenAI, our approach focused on using specialized markers to separate trusted instructions from user data. This exploration allowed us to understand the effectiveness of these methods in real-world scenarios.

Organizations and developers can access our model and fine-tuning scripts on Hugging Face and Ollama to test and extend our findings.

Read on WithSecure Labs

In this piece of research, we demonstrate how Google Gemini Advanced can be manipulated through prompt injection attacks to perform social engineering. By sending a single email to the target user, attackers can trick LLMs into misleading users into revealing sensitive information.  

This research is crucial for enterprises using LLMs in environments that handle confidential data, as it exposes how easily attackers can compromise the integrity of the AI’s output.  

We also discuss the current limitations in defending against these attacks and emphasize the need for caution when using AI-powered assistants. 

Read on WithSecure Labs

This research focuses on how prompt injection can be used to manipulate ReAct (Reason and Act) LLM agents by injecting false observations or commands. Such attacks can lead the LLM to make incorrect decisions or take unintended actions, which could benefit attackers.  

This type of vulnerability is particularly concerning for organizations using LLM agents in decision-making processes.  

We outline two main categories of attacks, discuss their potential impact on operations, and suggest strategies to mitigate the risks associated with these manipulations. 

Read on WithSecure Labs

We explore how domain-specific prompt injection attacks can be detected using a fine-tuned BERT classifier. This research provides a practical method for identifying malicious prompts by training a small model on domain-specific data, allowing it to distinguish between legitimate and malicious inputs.

It’s aimed at security teams and developers working with AI models who need reliable mechanisms to detect prompt injections.

We present our findings on the effectiveness of this approach and how it can be implemented in various environments. 

Read on WithSecure Labs

This one-page canvas offers a structured framework of security controls designed to protect LLM applications from prompt injection and jailbreak attacks.  

Based on our client engagements, the canvas outlines the essential steps needed to secure inputs and outputs in LLM workflows.  

It’s a practical tool for developers and security teams looking to safeguard their AI systems by implementing tested controls that reduce the likelihood of prompt-based attacks and mitigate their impact.

Read on WithSecure Labs
Generative AI – An Attacker’s View

This blog delves into the use of GenAI (Generative Artificial Intelligence) by threat actors and what we can do to defend ourselves.

Topics covered include: social engineering, phishing, recon and malware generation.

Read on WithSecure Labs

GenAI security challenges

To foster hands-on learning, we’ve released two public Capture the Flag (CTF) challenges focused on LLM security.  

The challenges are designed for security researchers and practitioners who want practical experience with LLM security issues.

MyLLMBank

This challenge allows you to experiment with jailbreaks/prompt injection against LLM chat agents that use ReAct to call tools.

MyLLMDoc

This is an advanced challenge focusing on multi-chain prompt injection scenarios.

Related content

December 24, 2021 Our thinking

(Deep)Learning to trust

This article looks at the risks of putting too much trust in strong test performance, and how to improve the assessment of deep learning projects’ usefulness in a business context.

Read more
September 11, 2024

Generative AI Security

Ensure the security of your GenAI powered integrations and solutions

Read more
April 12, 2024 Webinars

Building secure LLM apps into your business

Gain practical understanding of the vulnerabilities of LLM agents and learn about essential tools and techniques to secure your LLM-based apps. Our host Janne Kauhanen is joined by Donato Capitella, Principal Security Consultant at WithSecure™.

Read more
May 17, 2024 Our thinking

Prompt injections could confuse AI-powered agents

We wanted to explore how attackers could potentially compromise large language model (LLM) powered AI applications.

Read more

Check out our latest research on WithSecure Labs

For techies, by techies – we share knowledge and research for public use within the security community. We offer up-to-date research, quick updates, and useful tools.

Go to WithSecure Labs

Our accreditations and certificates

Contact us!

Our team of dedicated experts can help guide you in finding the right solution for your unique issues. Complete the form and we are happy to reach out as soon as possible to discuss more.