Google Battles Indirect Prompt Injection in AI

Google DeepMind has rolled out new defenses to tackle a growing threat to AI systems—indirect prompt injection (IPI) attacks. These attacks are especially dangerous for agentic AI, which learns autonomously by interacting with external tools and user data.

Unlike traditional cyberattacks, IPI doesn’t require access to the AI model’s internal systems or training data. Instead, attackers embed malicious prompts into content—like an email or webpage—that the AI interacts with during its operations. Once exposed, the model can adopt these malicious instructions and behave in ways that benefit the attacker.

For example, an attacker might hide commands inside a user email that tells the model to leak private information. When the model reads the email, it unknowingly follows the embedded prompt. The attacker doesn’t need insider knowledge—just a clever, indirect message.

DeepMind’s Iterative Defense Approach for Gemini 2.5

To defend against this subtle but powerful threat, DeepMind has created a dynamic, evolving defense strategy for Gemini 2.5. This strategy includes continuous red teaming and real-time fine-tuning, allowing the system to recognize and reject hidden malicious instructions before they affect performance.

In a new white paper titled Lessons from Defending Gemini Against Indirect Prompt Injections, the DeepMind team explains how it trained Gemini on a broad range of realistic IPI scenarios. Using an automated red teaming (ART) process, DeepMind generated countless adaptive attacks. These were used to teach the model to stay focused on genuine user input while ignoring deceptive triggers.

Gemini 2.5 now performs significantly better in defending against IPI threats. In one email-based test using the “TAP” (Tree of Attacks with Pruning) method—a sophisticated form of adaptive prompt injection—Gemini 2.0 had a 99.8% failure rate. With the new defenses in Gemini 2.5, that number dropped to 53.6%. It’s not perfect, but it’s a major leap in making such attacks more difficult and expensive for adversaries to execute.

However, DeepMind notes that there’s no silver bullet. Attackers continue to evolve, so defenses must do the same. As one method is blocked, a new one emerges. That’s why DeepMind combines its adversarial training with previously effective system-level protections like the “Warning” defense, which tells the AI not to respond to untrusted data.

Together, these layers of defense make it far harder for attackers to succeed. The result is a model that’s not only smarter but also safer.

DeepMind acknowledges that no system is unbreakable. But the goal of cybersecurity isn’t perfection—it’s resistance. And Gemini 2.5 now represents the strongest line of defense the company has built to date.

Share with others