OpenAI Updates ChatGPT Atlas Browser Against Prompt Injection Attacks

Image Credit: Jacky Lee

OpenAI says it has shipped a security update for the browser agent in ChatGPT Atlas, aimed at reducing the risk of prompt injection attacks that can manipulate an AI agent into taking unintended actions. The company described the update on December 22, 2025, and said it was triggered by a new class of prompt injection attacks found through internal automated red teaming.

What Happened

In a security post, OpenAI said it shipped an update to Atlas’s browser agent that includes a newly adversarially trained model and strengthened safeguards around it.

OpenAI said the work is part of a broader effort to proactively discover and patch real world agent exploits before they are weaponised in the wild.

What Is ChatGPT Atlas?

ChatGPT Atlas is OpenAI’s web browser with ChatGPT built in, launched on October 21, 2025. OpenAI says the browser agent can view webpages and take actions, including clicks and keystrokes, inside the browser.

OpenAI also acknowledges that giving an agent the ability to act inside a browser creates new risks, including hidden malicious instructions embedded in content like webpages or emails.

OpenAI says it designed Atlas agent safeguards to reduce what the agent can do if it is misled. In its product launch post, OpenAI said the agent cannot run code in the browser, download files, or install extensions, and it cannot access other apps on your computer or your file system.

What Is Prompt Injection?

OpenAI describes prompt injection as a type of social engineering attack aimed at AI systems, where malicious instructions are inserted into the content an AI processes in order to override the user’s intent. OpenAI compares the idea to phishing and scams that try to trick people into doing something they did not intend.

For browser agents, OpenAI argues this is a distinct threat vector because the attacker is not just targeting a human user or exploiting a software bug. Instead, the attacker is attempting to influence the agent’s decision making directly, using content the agent reads.

How OpenAI Says It Found The Issue and Improved Defences

OpenAI’s post focuses on a security workflow it calls a proactive rapid response loop. The idea is to continuously discover new attacks internally, then retrain and harden defences quickly.

OpenAI says it built an LLM based automated attacker and trained it end to end with reinforcement learning to hunt for successful prompt injection attacks against a browser agent. It says the attacker can propose a candidate injection, test it in an external simulator, and learn from successes and failures.

OpenAI also says its RL trained attacker can steer an agent into long horizon harmful workflows that unfold over tens or even hundreds of steps, which OpenAI presents as closer to real world agent misuse than one step failures.

When the automated attacker finds a new class of successful attacks, OpenAI says it uses that result in two ways:

  • Adversarial training of updated agent models to better ignore adversarial instructions and stay aligned with the user’s intent

  • Using attack traces to improve the wider defence stack, including monitoring, safety instructions placed in the model context, and system level safeguards

OpenAI says a newly adversarially trained browser agent checkpoint produced through this cycle has already been rolled out to all ChatGPT Atlas users.

Limitations

OpenAI explicitly frames prompt injection as a long term security challenge, comparing it to evolving online scams, and says deterministic security guarantees are difficult in this domain.

In a separate OpenAI explainer, the company also describes prompt injection as a hard, open problem in machine learning robustness that it expects to evolve over time.

OpenAI is not claiming prompt injection is solved. It is investing in a continuous discovery and mitigation loop to reduce real world risk.

Comparing to Other Agent Security Approaches

OpenAI’s stance is broadly aligned with what other AI and browser teams have been publishing: layered defences and continuous testing, rather than a one off fix.

Anthropic, for example, describes prompt injection as one of the most significant security challenges for browser based AI agents, and says it uses reinforcement learning for robustness, scanning untrusted content with classifiers, and human red teaming. Anthropic also cautions that no browser agent is immune to prompt injection.

Perplexity has taken a similar framing for its Comet browser, describing malicious prompt injection as a frontier security problem and outlining a defence in depth design, including real time classification as one layer.

Brave, meanwhile, has published security research arguing that indirect prompt injection is a systemic issue for AI powered browsers, and it has positioned its own agent style browsing feature as an opt in experiment in Brave Nightly, including use of an isolated browsing profile and built in restrictions and controls.

What Users Can Do Right Now

OpenAI recommends practical steps to reduce exposure when using agent features, especially where logged in sessions could amplify harm:

  • Use logged out mode when the agent does not need account access

  • Watch and review the agent’s actions, especially on sensitive sites like financial institutions

  • Keep tasks specific rather than asking the agent to take broad, open ended actions across accounts

The security story around agentic browsing is likely to keep moving quickly, for a simple reason: the web is an adversarial environment, and agents expand the ways untrusted content can influence real actions. The key development to track is not just new features, but how teams shorten the time between discovering new attack patterns and shipping mitigations, including model updates and system level guardrails.

License This Article

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Next
Next

Australia’s Social Media Ban: High Court to Hear Reddit and Constitutional Challenges