Claude Opus 4.5 Launched: Infinite Chats, $5/M Input, & 80.9% SWE-Bench Win

Illustrative Image Only (Credit: Jacky Lee)

Anthropic has introduced Claude Opus 4.5, the newest and most capable model in its Claude 4.5 family, with a particular focus on coding, long-horizon agents and office automation. The release coincides with a major update to Claude’s context handling that lets users run “infinite-length” conversations in the company’s apps by automatically summarising older messages when chats approach the context limit.

The launch follows earlier 4.5-series releases: Claude Sonnet 4.5 in September and Claude Haiku 4.5 in October, rounding out a three-tier line-up of small, mid-range and flagship models.

Anthropic describes Opus 4.5 as its strongest model for coding and computer use, positioning it against rival systems such as OpenAI’s GPT-5.1 and Google’s Gemini 3 Pro.

Model Specifications, Pricing and Safety Classification

According to Anthropic’s model documentation and independent technical write-ups, Claude Opus 4.5 offers:

  • Context window: 200,000 tokens, with up to 64,000 output tokens in a single response.

  • Knowledge and training cut-offs: a “reliable knowledge” cut-off around March 2025, with training data extending into mid-2025.

  • Availability: Anthropic’s own apps (claude.ai, Claude Code), its API, and cloud platforms including Amazon Bedrock, Google Cloud Vertex AI and Microsoft Azure.

  • Pricing: USD $5 per million input tokens and $25 per million output tokens via the Claude API, a substantial reduction from the earlier Opus 4 / 4.1 pricing of $15 / $75.

Opus 4.5 is deployed under Anthropic’s AI Safety Level 3 (ASL-3) classification, the company’s second-highest risk tier. ASL-3 models undergo targeted testing for risks such as advanced cyber-capabilities, deception and autonomous behaviour.

Anthropic’s system card and related safety materials say Opus 4.5 includes strengthened defences against prompt-injection attacks, particularly in coding and “computer use” settings where the model can operate software tools on a user’s behalf. These defences reduce, but do not eliminate, the risk of the model following malicious instructions embedded in documents, websites or tool outputs.

Dynamic Context Management and “Infinite-Length” Conversations

The most visible change for everyday Claude users is context window compaction, introduced in the November 24 release notes alongside Opus 4.5.

Claude’s help centre explains that:

  • All Claude models in the apps now work with a 200K-token context window for paid plans, with an extended 500K context option for Sonnet 4.5 on certain enterprise tiers.

  • When a conversation approaches this limit, the system automatically summarises earlier messages so the chat can continue without a hard cut-off.

  • The full chat transcript remains accessible to the user, and the model continues to see a compressed representation of prior exchanges as it answers new prompts.

  • Users may occasionally see messages indicating Claude is “organising its thoughts” while this automatic context management runs.

Anthropic describes this as enabling “infinite-length conversations (with some exceptions)” in Claude’s apps, by trading raw history for structured summaries once the context is nearly full.

The approach builds on research the company published in September on “effective context engineering for AI agents,” which advocates periodic compaction — turning long interaction logs into concise state summaries — to keep multi-step agents reliable over many iterations.

For users, the main practical effect is that very long chats, where previously the oldest messages would be truncated, now continue with fewer length-limit errors, albeit with a reliance on the model’s own summarisation of earlier stages of the discussion.

Coding, Agents and Office Automation

Anthropic is pitching Opus 4.5 primarily as a model for complex software and office workflows rather than casual chat. On established benchmarks, the model posts state-of-the-art results for code-focused tasks:

  • SWE-bench Verified: 80.9% task success, outperforming Claude Sonnet 4.5 and third-party rivals such as GPT-5.1 Codex Max and Gemini 3 Pro, according to independent evaluations by Artificial Analysis, Vellum and Cline.

  • Terminal-Bench 2.0: around 59% on command-line computer-use tasks, ahead of key competitors on the same benchmark setup.

Anthropic and external testers highlight several product-level features:

  • An “effort” parameter in the API lets developers trade off speed and cost against more intensive step-by-step reasoning.

  • Integration with Claude Code and third-party tools such as Cursor, Cline and various AI IDEs is aimed at long-horizon coding tasks, including refactors across many files and multi-agent workflows.

  • In office environments, Opus 4.5 powers a new Claude for Excel beta and automates PowerPoint slide creation, with Anthropic-shared internal evaluations suggesting material gains in accuracy and efficiency for spreadsheet tasks.

Business Insider reports that Opus 4.5 also outperformed all human candidates on Anthropic’s own two-hour engineering take-home test, though the company allowed the model multiple attempts and selected its best answers — details that make direct comparison with a single human sitting less straightforward.

For long-form writing, independent reviewers note that the model can sustain coherent 10–15-page chapters or technical documents when combined with the new context compaction, particularly in workflows that mix drafting, outlining and revision in a single extended session.

Safety, Limitations and Ongoing Risks

Anthropic’s Opus 4.5 system card emphasises expanded safety testing around prompt injection, malware and misuse in agentic settings. The Verge’s reporting, summarising the card, notes that the model refused all 150 malicious coding requests in an internal “agentic coding evaluation”, while refusal rates were lower, though still high, for broader categories of harmful tasks.

Despite these safeguards, both Anthropic and external analysts stress that Opus 4.5 is not immune to misuse. As with prior frontier models, residual risks include:

  • Following cleverly hidden instructions in documents or web pages the model is asked to process.

  • Generating code or content that could be repurposed for harmful ends if user-side filters and policies are weak.

Automatic summarisation also introduces subtle trade-offs. While it reduces context limit errors and keeps long chats usable, it inevitably compresses nuance from earlier conversation stages; how often this materially changes outcomes in real-world workflows is likely to remain an active area of evaluation for enterprise users.

Competitive Landscape and Market Context

Claude Opus 4.5 enters an increasingly crowded high-end market:

  • OpenAI’s GPT-5 and 5.1 models offer up to a 400,000-token context window in the API, with 128K–272K token output limits depending on the variant.

  • Google’s Gemini 3 Pro advertises a context window of around 1 million tokens for some modalities, targeting very large multimodal workloads.

  • xAI’s Grok 4.1 Fast positions itself as a tool-calling specialist with a 2-million-token context window for agentic workflows tied to real-time X (Twitter) data.

On the enterprise side, Menlo Ventures’ mid-2025 survey of more than 150 technical leaders found Anthropic accounting for 32% of enterprise LLM usage, ahead of OpenAI at 25%, Google at 20% and Meta at 9%. Those figures, which refer to usage share rather than revenue, underpin claims in recent coverage that Claude has become a default choice for many coding and agentic workflows.

With Opus 4.5, Anthropic is attempting to consolidate that position at the top end of the market while narrowing the price gap with mid-range models. Analyst commentary and early technical reviews broadly agree that Opus 4.5 now sits among the strongest general-purpose coding and reasoning systems available, with competitive pricing but still higher costs than most mid-tier offerings.

Memory-Aware Models and Long-Horizon Agents

The Opus 4.5 launch and the simultaneous context window compaction update point to a broader trend toward “memory-aware” AI assistants. Rather than simply increasing raw context limits, Anthropic is betting on systems that actively manage their own working memory by compressing older interactions into structured summaries.

For developers and knowledge workers, the practical question will be how reliably these summaries preserve intent and detail over hours- or days-long sessions. For the wider industry, the release raises the competitive bar on three fronts:

  • Agentic performance: maintaining coherent plans and tool use across many steps.

  • Context strategy: combining large windows with automatic compaction rather than relying purely on fixed limits.

  • Safety at scale: hardening models against prompt injection and misuse while still allowing deep system access in enterprise environments.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

Bamboo or Nets? We Asked AI to Settle the Hong Kong Fire Debate with Science

Next
Next

Google ADK Adds Real-Time Streaming: 40% of Apps to Use Agents by 2026