OpenAI Launches ChatGPT Agent to Automate Online Tasks and Enhance Productivity
Image Credit: Levart | Splash
OpenAI has introduced ChatGPT agent, a new feature enabling its chatbot to execute multi-step online tasks by combining web browsing, research, and conversational abilities, advancing AI's role in practical applications.
Launched on July 17, the tool is available to subscribers on Pro, Plus, and Team plans, with Pro users gaining access first and full rollout completed by July 25. It handles tasks like summarizing calendar meetings, planning meals with online purchases, or generating spreadsheets from data. Operating in a virtual environment, the agent uses a browser for navigation, a terminal for coding, and APIs for integrations with services like Gmail or GitHub. OpenAI requires user approval for actions with real-world impact and allows task interruptions.
The agent integrates three prior systems: Operator for web tasks, deep research for information synthesis, and core ChatGPT for dialogue. Usage is capped at 400 prompts monthly for Pro users and 40 for Plus and Team users, with concurrency limits applied to maintain performance. Enterprise and Education plan access is slated for coming weeks, with no timeline for free users. Task duration varies depending on complexity; most take 5 to 30 minutes, though some may take longer.
Background and Development
The launch builds on OpenAI's 2025 releases of Operator and deep research, aligning with industry efforts to develop AI that acts independently. Competitors like Google and Anthropic are pursuing similar tools, driven by demand for efficiency in sectors like finance and consumer services. Built on the o3 model series, the agent uses reinforcement learning to select tools like browsers or terminals within a unified virtual machine.
Development addressed shortcomings in prior systems, such as Operator's struggles with lengthy content and deep research's limitations on interactive sites. OpenAI CEO Sam Altman, in a July 17 post on X, urged caution due to potential risks. The company labels the agent "high capability" for misuse in areas like biological or chemical applications, employing automated oversight and disabling memory features to reduce risks like prompt injection attacks.
Benchmark and Performance
Recent benchmark tests illustrate the agent's advancements compared to prior models and human performance.
DSBench evaluates realistic data science tasks, such as analyzing and modeling data; here, ChatGPT agent achieved an 85.5% relative performance gain, outperforming AutoGen with GPT-4o at 45.5%, humans at 65.0%, and OpenAI o3 at 77.1%.
SpreadsheetBench assesses the ability to edit spreadsheets based on real-world scenarios; the agent scored 35.3% accuracy without direct file editing and 45.5% with .xlsx access, surpassing OpenAI o3 at 23.3% and other models like GPT-4o at around 17-20%, though below human levels at 71.3%.
In Investment Banking Modelling Tasks, which measure skills like building financial models for companies, the agent reached 71.3% overall accuracy (with a mean of 41.0%), better than deep research at 48.6% (mean 19.7%) and OpenAI o3 at 55.9% (mean 27.5%).
FrontierMath tests expert-level math problems that can take professionals hours to solve; the agent hit 27.4% accuracy using browser and terminal tools, exceeding OpenAI o3 at 10.3% and o4-mini at 19.3%.
BrowseComp gauges finding hard-to-find information online; the agent scored 68.9%, ahead of deep research at 51.5% and OpenAI o3 at 49.7%.
WebArena examines completing everyday web tasks; the agent achieved 65.4% accuracy, improving on CUA o3 at 62.9% and CUA40 at 58.1%, approaching human performance at 78.2%.
Pros, Cons, and Impact
ChatGPT agent streamlines routine tasks, producing outputs like spreadsheets or beta-stage presentations that are natively editable after export, potentially reducing reliance on external software. For businesses, it supports data-intensive workflows, enhancing productivity.
Limitations include occasional inaccuracies in transcription or task execution, processing delays, and subscription-based access. Privacy concerns arise from data access during integrations, though mitigated by session deletion and restricted connectors. Early tests suggest unreliability for high-stakes tasks, with safeguards sometimes blocking valid actions.
Recent benchmark tests highlight the agent's performance advancements. In the DSBench: Data Modelling test, ChatGPT agent achieved an 85.5% relative performance gain, surpassing AutoGen with GPT-4o (45.5%), Human (65.0%), and OpenAI o3 (77.1%). On SpreadsheetBench, it recorded a 71.3% accuracy rate, outperforming deep research (19.7%), OpenAI o3 (27.5%), and humans (45.5%), with added access to .xlsx files boosting its edge. In FrontierMath Tier 1-3 expert-level math tasks, it reached 27.4% accuracy with browser and terminal tools, exceeding OpenAI o3 (10.3%) and o4-mini (19.3%). Additionally, BrowseComp and WebArena tests showed 68.9% and 78.2% accuracy, respectively, outpacing deep research (49.7% and 58.1%) and CUA models (51.5% and 62.9%), nearing human levels (78.2%).
Analysts see the tool advancing AI's shift toward actionable roles, impacting knowledge-based sectors. However, it raises concerns about job displacement in administrative roles and ethical challenges, echoing issues with earlier AI tools that have amplified biases or misinformation.
Future Trends
OpenAI plans updates to improve speed and expand features like image generation for outputs. The industry anticipates further agent development, with a focus on reliability and safety amid growing regulatory attention. While broader access could drive adoption, challenges in accuracy and misuse prevention may limit short-term impact.
Source: OpenAI, Reuters, TechCrunch, Tom’s Guide, Tech Radar
We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.
