Prompt injections could confuse AI-powered agents

Share

Everyone knows about SQL injections, but what about prompt injections? What do they mean for AI?

AI in general, and large language models (LLMs) in particular, have exploded in popularity. And this momentum is likely to continue or even increase as companies look into using LLMs to power AI applications capable of interacting with real people and take actions that affect the world around them.

WithSecure’s Donato Capitella, a security consultant and researcher, wanted to explore how attackers could potentially compromise these agents. And prompt injection techniques gave him his answer.

The Echoes of SQL Vulnerabilities

Prompt injection techniques are specifically crafted inputs that attackers feed to LLMs as part of a prompt to manipulate responses. In a sense, they’re similar to the SQL injections that attackers have been using to attack databases for years. Injections are basically just commands attackers input to a vulnerable, external system.

Whereas SQL injections affect databases, prompt injections impact LLMs. Sometimes, a successful prompt injection might not have much of an impact. As Donato points out in his research (available here), in situations where the LLM is isolated from other users or systems, an injection probably won’t be able to do much damage.

However, companies aren’t building LLM applications to work in isolation, so they should understand what risks they’re exposing themselves if they neglect to secure these AI deployments.

ReAct Agents

One potential innovation where LLMs could play a key role is in the creation of AI agents—or ReAct (reasoning plus action) agents if you want to be specific. These agents are essentially programs that use LLMs (like GPT-4) to accept input, and then use logical reasoning to decide and execute a specific course of action according to its programming.

The way these agents use reasoning to make decisions involves a thought/observation loop. Specifics are available in Donato’s research on WithSecure Labs (we highly recommend reading it for a more detailed explanation). Basically, the agent provides thoughts it has about a particular prompt it’s been given. That output is then checked to see if it contains an action that requires the agent to access a particular tool it’s programmed to use.

If the thought requires the agent to take an action, the result of the action becomes an observation. The observation is then incorporated into the output, which is then fed back into the thought/observation loop and repeated until the agent has addressed the initial prompt from the user.

To illustrate this process, and learn how to compromise it, Donato created a chatbot for a fictional book-selling website that can help customers request information on recent orders or ask for refunds.

Prompt injections reduce AI to confused deputies

The chatbot, powered by GPT-4, could access order data for users and determine refund eligibility for orders that were not delivered within the website’s two-week delivery timeframe (as per its policy).

Donato found that he could use several different prompt injection techniques to trick the agent into processing refunds for orders that should have been ineligible. Specifics are available in his blog, but he essentially tricked the agent into thinking that it had already checked for information from its system that he actually provided to it via prompts—information like fake order dates. Since the agent thought it recalled the fake dates from the appropriate system (rather than via Donato’s prompts), it didn’t realize the information was fake, and that it was being tricked.

Here’s a video showing one of the techniques Donato used:

Securing AI agents

Pointing to the work from the OWASP Top Ten for LLMs, Donato’s research identifies several ways an attacker could compromise an LLM ReAct agent. And while it’s a proof-of-concept, it does illustrate the kind of work that organizations need to do to secure these types of AI applications, and what the cyber security industry is doing to help.

There’s two distinct yet related mitigation strategies.

The first is to limit the potential damage a successful injection attack can cause. Specific recommendations based on Donato’s research include:

  • Enforcing stringent privilege controls to ensure LLMs can access only the essentials, minimizing potential breach points.
  • Incorporating human oversight for critical operations to add a layer of validation, acting as a safeguard against unintended LLM actions.
  • Adopting solutions such as OpenAI Chat Markup Language (ChatML) that attempt to segregate genuine user prompts from other content. These are not perfect but diminish the influence of external or manipulated inputs.
  • Treating the LLMs as untrusted, always maintaining external control in decision-making and being vigilant of potentially untrustworthy LLM responses.

The second is to secure any tools or systems that the agent may have access to, as compromises in those will inevitably lead to the agent making bad decisions—possibly in service of an attacker.

Read more research articles on securing AI agents on our Labs:

Related content

April 12, 2024 Webinars

Building secure LLM apps into your business

Gain practical understanding of the vulnerabilities of LLM agents and learn about essential tools and techniques to secure your LLM-based apps. Our host Janne Kauhanen is joined by Donato Capitella, Principal Security Consultant at WithSecure™.

Read more
April 16, 2024 Our thinking

Striking the balance – EU AI Act and its impact on cyber security

Balancing innovation and regulation is a delicate task, especially in the realm of cyber security where the implications of AI are profound.

Read more
April 17, 2024 Our thinking

What is Attack Path Mapping

Attack path mapping involves the identification and analysis of potential routes that a cyber attacker could take to infiltrate a target system or network.

Read more

Check out our latest research on WithSecure Labs

For techies, by techies – we share knowledge and research for public use within the security community. We offer up-to-date research, quick updates, and useful tools.

Go to WithSecure Labs

Our accreditations and certificates

Contact us!

Our team of dedicated experts can help guide you in finding the right solution for your unique issues. Complete the form and we are happy to reach out as soon as possible to discuss more.