Skip to main content
Cybersecurity News Kinetic Potential

‘ZombieAgent’ Attack Let Researchers Take Over ChatGPT

ChatGPT vulnerabilities could be exploited to exfiltrate user data and modify the agent’s long-term memory for persistence, web security firm Radware reports.

Widely adopted across enterprises worldwide, ChatGPT has broad access to internal applications, such as Gmail, GitHub, Jira, and Teams, and by default stores user conversations and sensitive information.

It also includes built-in functionality to browse the web, analyze files, and more, making it convenient and powerful, but also expanding the risks associated with its malicious use.

On Thursday, Radware disclosed a new indirect prompt injection technique that exploits ChatGPT vulnerabilities to exfiltrate user data and turn the AI agent into a persistent spy tool for attackers.

Called ZombieAgent, the attack relies on malicious emails and files to bypass OpenAI’s protections and exfiltrate data from the victim’s inbox and email address book, without user interaction.

In the first scenario detailed by Radware, the attacker exfiltrates sensitive user data via OpenAI’s private servers by sending an email containing malicious instructions for ChatGPT.

When the user asks the AI agent to perform a Gmail action, it reads the instructions in the attacker’s email and exfiltrates the data “before the user ever sees the content”, Radware says.

The email contains a list of pre-constructed URLs for each letter and digit, and a special token for spaces, and instructs ChatGPT to search for sensitive information, normalize it, and exfiltrate it character by character using the provided URLs.

ChatGPT cannot modify provided URLs to prevent the leakage of data by appending it as parameters to an attacker-provided link, but Radware’s attack makes the protection ineffective as the agent does not modify the pre-provided URLs.

For the attack to successfully exfiltrate sensitive information, “no user action is required beyond normal conversation with ChatGPT,” the security firm explains.

Radware’s second attack scenario relies on malicious instructions contained in a file that the user shares with ChatGPT. Based on these instructions, the agent exfiltrates data, both via OpenAI’s servers and via Markdown image rendering.

Propagation and persistence

The third attack scenario presented by the security firm is like the first but targets the recent email addresses in the victim’s inbox. After receiving the addresses, the attacker sends the malicious payload to them, propagating the attack.

In the fourth attack scenario, the attacker establishes persistence by sending a malicious file containing instructions to modify the agent’s long-term memory with attacker-created rules.

When the user shares the file with ChatGPT, the agent reads the instructions and sets the memory-modification rules.

Based on these rules, ChatGPT reads an attacker email and executes the instructions it contains every time the user sends a message, and saves to memory sensitive information whenever the user shares it.

Normally, when using the Connectors feature (which gives it access to enterprise applications), ChatGPT cannot use the Memory feature (where it saves users’ sensitive information) in the same chat.

However, the attacker’s memory-modification rules result in the agent always reading Memory first, executing the attacker’s malicious instructions, and only then responding to the user.

According to Radware, this persistence mechanism can be abused for data manipulation or for performing more harmful actions.

Furthermore, the security firm says, the attacks could target not only email, but any other enterprise application connected to ChatGPT, either for data harvesting or for delivering malicious instructions to the agent.

“In practice, any resource that ChatGPT can read via Connectors (emails, documents, tickets, repositories, shared folders, etc.) can potentially be abused to host attacker-controlled instructions that will later be executed by ChatGPT,” Radware notes.

An attacker could hide the malicious instructions in the content of any email or file, either by making the text white or by including them in a document’s disclaimers or footers, which are typically ignored by users.

“From the user’s perspective, the email or document appears benign and readable. From ChatGPT’s perspective, however, the full hidden prompt is visible in plain text and will be processed just like any other instruction,” the security firm says.

Radware reported the issues to OpenAI via BugCrowd in September. A fix was released on December 16.

This article was published by Security Week. Please check their website for the original content.

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
2 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.