Anthropic Launches Claude Opus 4.5: 80.9% Coding Score & New USD 5 Pricing

Image Source: Claude

Artificial intelligence firm Anthropic has introduced Claude Opus 4.5, its latest frontier model, describing it as its strongest system for coding, agentic workflows and computer-use tasks. Announced on 24 November, the launch completes the company’s 4.5 generation of models and follows recent flagship releases from rivals OpenAI and Google. The model is available immediately through Anthropic’s Claude applications, API and major cloud platforms, including AWS, Google Cloud and Microsoft’s Foundry environment.

Anthropic, founded in 2021 by former OpenAI employees including chief executive Dario Amodei, has positioned itself around AI safety and interpretability. The company is backed by investors such as Amazon, Google and Menlo Ventures, and Reuters recently reported that its annual revenue run rate is close to US$7 billion, with about 80 per cent of income coming from corporate clients and more than 300,000 enterprise customers.

Model Development and Key Features

Claude Opus 4.5 is presented as a refinement of Opus 4.1, released in August, with a focus on deeper reasoning, greater token efficiency and improved robustness against adversarial prompts. According to Anthropic’s technical notes, the model can reach the same or better scores on key coding benchmarks while using substantially fewer tokens than Claude Sonnet 4.5 when configured at comparable “Effort” levels. In high-effort configurations on SWE-Bench Verified, Opus 4.5 uses up to 48 per cent fewer tokens than Sonnet 4.5 while improving accuracy, and in medium-effort settings token usage can fall by more than three quarters for the same score.

A major user-facing change is an “endless chat” experience in Anthropic’s apps. Instead of halting long conversations when context limits are reached, Claude now summarises earlier parts of the discussion so that sessions can continue, with the company saying this behaviour applies across current Claude models in the apps, not only Opus 4.5.

For developers, Opus 4.5 underpins updated tools such as Claude Code and Anthropic’s “computer use” capabilities. The Effort parameter allows teams to trade latency and cost against depth of reasoning on a per-task basis, while an enhanced planning mode in Claude Code supports longer, multi-step coding sessions.

The model also powers integrations including Claude for Chrome and Claude for Excel. These extensions enable the system to navigate web pages, interact with on-screen elements, and work inside spreadsheets — filling cells, restructuring data and carrying out multi-step workflows under user instruction.

Safety Evaluations and Alignment

Anthropic has published a system card for Opus 4.5 describing its safety testing and mitigations. In an agentic evaluation of 150 malicious coding tasks, designed to probe whether the model would assist with clearly harmful requests, the system card reports that Opus 4.5 refused all of them. The company also says the model exhibits lower rates of sycophancy and deceptive behaviour than earlier Claude versions.

The model is governed under Anthropic’s AI Safety Level 3 (ASL-3) framework, which includes specialised filters for areas such as cyber misuse and chemical, biological, radiological and nuclear (CBRN) risks. At the same time, the system card notes that Opus 4.5, like other frontier models, remains vulnerable to sophisticated prompt-injection attacks and requires careful deployment when used as an autonomous agent.

Performance Benchmarks and Comparative Analysis

On external benchmarks, Claude Opus 4.5’s strongest results are in software engineering and computer-use tasks. Anthropic reports an 80.9 per cent score on SWE-Bench Verified, a benchmark based on real GitHub issues, making Opus 4.5 the first model to surpass the 80 per cent threshold. This places it ahead of OpenAI’s GPT-5.1 Codex Max at 77.9 per cent and Google’s Gemini 3 Pro at 76.2 per cent in the same evaluation.

In computer navigation, Opus 4.5 records 66.3 per cent on OSWorld, Anthropic’s highest score on that benchmark to date. On ARC-AGI 2, which tests generalisation to novel reasoning problems, the model scores 37.6 per cent, and it achieves strong results on tool-use benchmarks such as Tau-bench variants, where it ranks among the top reported systems.

Anthropic also highlights internal testing. On a two-hour take-home engineering assessment used for prospective hires, Opus 4.5—when allowed to use parallel test-time compute and run inside Claude Code—outperformed all human candidates who had taken the timed test and matched the best unlimited-time human score on the same tasks.

Independent practitioners report similar experiences in extended real-world trials. Developer and author Simon Willison, for example, wrote that Opus 4.5 effectively handled a two-day refactor of his sqlite-utils project, generating most of the changes across dozens of files while he supervised and corrected edge cases.

At the same time, comparative analyses suggest that other frontier models retain advantages in some areas. Reviews of benchmark tables compiled from vendor and third-party evaluations show Google’s Gemini 3 Pro leading on high-end science and reasoning tests such as GPQA Diamond and certain adversarial math suites, while OpenAI’s GPT-5.1 performs strongly on multimodal and visual reasoning tasks. In contrast, Opus 4.5’s main edge appears in long-horizon coding, tool use and operating-system control.

Pricing underscores Anthropic’s positioning. The company has cut Opus pricing to US$5 per million input tokens and US$25 per million output tokens, down from US$15 and US$75 respectively for Opus 4.1. Claude Sonnet 4.5, released in late September, is priced at US$3/US$15, while Haiku 4.5, launched in October, sits at roughly US$1/US$5. Together, the three models form a tiered stack that targets different combinations of cost, latency and capability.

Implications for AI Adoption and Industry Dynamics

Opus 4.5 arrives as Anthropic’s share of enterprise AI deployments continues to grow. A mid-year report from Menlo Ventures estimated that the company accounts for about 32 per cent of enterprise large language model usage by volume, ahead of OpenAI’s 25 per cent and Google’s roughly 20 per cent. Analysts attribute this shift in part to demand for coding and workflow automation tools that can be integrated into existing business systems.

The new model is aimed squarely at those use cases. With integrations into Chrome, Excel and partner platforms such as GitHub, Palantir and Snowflake, often via Sonnet 4.5 for cost reasons but with Opus 4.5 available for more demanding tasks, Anthropic is targeting sectors including finance, consulting and software development, where long-running agents and code-aware assistants are increasingly being trialled.

Financially, the launch continues a period of rapid expansion. Reuters has reported that Anthropic’s revenue run rate has climbed to nearly US$7 billion and that the company draws the majority of its income from corporate customers. Business Insider has separately reported that Anthropic is raising around US$5 billion at a valuation of roughly US$170 billion, highlighting the level of capital now flowing into frontier AI development and infrastructure.

Cybersecurity specialists and risk researchers describe Opus 4.5-style agentic models as a double-edged development. On one hand, stronger defensive coding capabilities and better refusal behaviour for harmful tasks can help reduce vulnerabilities in software systems. On the other, more capable autonomous agents may increase the potential impact of misuse if safety controls are bypassed or if models are chained together in insecure ways.

Future Trajectories in AI Evolution

The Opus 4.5 release offers a snapshot of how frontier AI models are evolving: toward more autonomous, tool-using systems that operate across extended time horizons, often with reduced token and compute costs compared with earlier generations. Anthropic has adopted a rapid release cycle for the 4.5 family — Haiku, Sonnet and now Opus within a span of months — allowing feedback from real-world deployments to be folded quickly into subsequent updates.

Regulators and standards bodies are watching this shift closely. Governments in the US, UK and elsewhere have set up AI safety institutes to evaluate frontier systems, and external researchers are beginning to use tools from Anthropic and its competitors to run independent red-teaming and robustness studies. Early work suggests that while alignment techniques and safety filters have improved, questions remain around long-term reliability, transparency of training data, and the governance of autonomous agents in sensitive settings.

For now, developers and enterprises can access Claude Opus 4.5 through claude.ai and via the API model identifier claude-opus-4-5-20251101. How it performs outside benchmark suites — across months of production use in complex organisations — will play a significant role in determining whether Anthropic’s bet on agentic, cost-optimised frontier models continues to pay off in an increasingly competitive market.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

Claude Opus 4.5 Launched: Infinite Chats, $5/M Input, & 80.9% SWE-Bench Win

Next
Next

SEO TitleEven G2 Launch: $599 Smart Glasses, 36g Weight & 50% Off Ring Deal