Attackers can use indirect prompt injections to trick Anthropic’s Claude into exfiltrating data the AI model’s users have access to, a security researcher has discovered.
The attack, Johann Rehberger of Embrace The Red explains, abuses Claude’s Files APIs, and is only possible if the AI model has network access (a feature enabled by default on certain plans and meant to allow Claude to access certain resources, such as code repositories and Anthropic APIs).
The attack is relatively straightforward: an indirect prompt injection payload can be used to read user data and store it in a file in Claude Code Interpreter’s sandbox, and then to trick the model into interacting with the Anthropic API using a key provided by the attacker.
The code in the payload requests Claude to upload the Code Interpreter file from the sandbox but, because the attacker’s API key is used, the file is uploaded to the attacker’s account.
Read more...