What was Claude Sonnet 4.5's OSWorld score and why does it matter?

61.4%, up from Sonnet 4's 42.2% four months earlier. OSWorld measures AI performance on real-world computer tasks: navigating software, filling forms, and clicking UI elements. It focuses on operational computer-use scenarios rather than abstract reasoning alone.

How long can Claude Sonnet 4.5 maintain focus on a single agentic coding task?

More than 30 hours on complex, multi-step tasks. Anthropic noted this duration changes what's architecturally feasible for autonomous engineering work. Individual results vary by task structure.

What is ASL-3 and why does it apply to Sonnet 4.5?

ASL-3 (AI Safety Level 3) is Anthropic's framework level for models requiring additional safeguards. Sonnet 4.5 is the first Claude model released under ASL-3 protections, which include classifiers screening inputs and outputs for CBRN-related content. False positive rates have decreased by a factor of 10 since initial deployment.

What is the Claude Agent SDK and how does it relate to this model?

The Claude Agent SDK launched alongside Sonnet 4.5. It gives you access to the same agent infrastructure that powers Claude Code: memory management across long tasks, permission systems, and subagent coordination. Use it to build custom agents on the same foundation.

What alignment improvements came with Sonnet 4.5?

Substantial reductions in sycophancy, deception, power-seeking, encouragement of delusional thinking, and compliance with harmful system prompts, measured via an automated behavioral auditor. The model also improved defenses against prompt injection attacks for computer use and agentic capabilities.

Why did specialists in finance, law, medicine, and STEM find Sonnet 4.5 significantly better than previous models?

Professionals assessed domain-specific knowledge and reasoning in Anthropic's expert evaluations. Results showed substantially better performance compared to older models, including Opus 4.1. The intelligence improvements extend beyond coding benchmarks.

Is Sonnet 4.5 priced differently from Sonnet 4?

Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.

Claude Sonnet 4.5

Claude Sonnet 4.5 is a coding model from Anthropic with strong benchmark scores, including 77.2% on SWE-bench Verified and 61.4% on OSWorld for computer use, sustaining 30+ hour agentic coding sessions, and delivering substantial gains across coding, reasoning, math, and domain-specific expertise.

File InputReasoningTool UseVision (Image)Explicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-sonnet-4.5',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Claude Sonnet 4.5 by Anthropic. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

0.8s

47tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

09/29/2025

Legal:Terms

•

Privacy

1.5s

55tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

—

09/29/2025

Legal:Terms

•

Privacy

0.7s

48tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

09/29/2025

More models by Anthropic

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.8s

97tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

04/16/2026

0.6s

52tps

$3.00/M

$15.00/M

Read:$0.3/M

Write:

$3.75/M

$10/K

+ input costs

—

02/17/2026

0.8s

52tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

02/05/2026

200K

0.4s

115tps

$1.00/M

$5.00/M

Read:$0.1/M

Write:

$1.25/M

$10.00/K

+ input costs

—

10/15/2025

0.7s

75tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

05/22/2025

200K

0.7s

48tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10.00/K

+ input costs

—

11/24/2024

About Claude Sonnet 4.5

Claude Sonnet 4.5 launched on September 29, 2025. The OSWorld result backed Anthropic's computer use claims directly: 61.4%, up from Sonnet 4's 42.2% just four months earlier. On SWE-bench Verified, the model scored 77.2% and maintained focus for 30+ hours on complex multi-step tasks, a duration threshold that changes what's architecturally feasible for autonomous engineering work.

Domain expert evaluation reinforced the benchmark numbers. Finance, law, medicine, and STEM specialists found substantially better domain-specific knowledge and reasoning compared to older models including Opus 4.1. Devin increased planning performance by 18% and end-to-end scores by 12%, the biggest jump since Claude Sonnet 3.6. Cursor, GitHub Copilot, and Figma Make reported significant gains in their specific domains. Claude Code shipped checkpoints and rollback, a native VS Code extension, and a refreshed terminal interface alongside this model.

At release, Sonnet 4.5 included substantial alignment improvements over prior Claude models. Safety gains are concrete: substantial reductions in sycophancy, deception, power-seeking, and tendency to encourage delusional thinking. Prompt injection defense for computer use and agentic capabilities improved considerably. Anthropic released the model under ASL-3 (AI Safety Level 3) protections, the first Claude model at that safety level, with CBRN (chemical, biological, radiological, and nuclear) classifiers active.

The Claude Agent SDK launched alongside Sonnet 4.5, giving you access to the same infrastructure that powers Claude Code: memory management, permission systems, and subagent coordination for building custom agents.

What To Consider When Choosing a Provider

Configuration: Sonnet 4.5's computer use capability is protected by ASL-3 (AI Safety Level 3) safeguards: classifiers that screen for potentially dangerous inputs and outputs. These may occasionally flag normal content. Anthropic has reduced false positive rates by a factor of 10 since the classifiers were first deployed.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Claude Sonnet 4.5

Best For

Computer use and real-world browser/software automation: Strong results on OSWorld at release among models evaluated then
Extended autonomous coding sessions: Documented 30+ hour capability for complex multi-step engineering tasks
Complex agent workflows: Anthropic explicitly positioned it for agent workloads at release
Domain-specific applications in finance, law, medicine, and STEM: Expert evaluation showed substantial gains in domain knowledge and reasoning compared to Opus 4.1
Production deployments requiring strong alignment properties: With reduced sycophancy and deception compared to earlier Claude releases at that time

Consider Alternatives When

Primary cost constraint: Haiku 4.5 may offer sufficient capability-per-cost for lighter workloads
Simple latency-sensitive tasks: Sonnet 4.5's capability depth comes with higher per-token cost than lighter models
Sonnet-tier large context: Check if Claude Sonnet 4.6 covers both the 1M tokens window and Sonnet pricing
Earlier-model parity: Earlier models handle some specific computer use or coding tasks equivalently

Conclusion

Claude Sonnet 4.5 represents a generation step in multiple capability areas simultaneously, computer use, agentic duration, domain expertise, and safety alignment all advanced in the same release. For teams building agents that do real work in real software environments over extended periods, this is the model where those capabilities came together.

Frequently Asked Questions

What was Claude Sonnet 4.5's OSWorld score and why does it matter?
61.4%, up from Sonnet 4's 42.2% four months earlier. OSWorld measures AI performance on real-world computer tasks: navigating software, filling forms, and clicking UI elements. It focuses on operational computer-use scenarios rather than abstract reasoning alone.
How long can Claude Sonnet 4.5 maintain focus on a single agentic coding task?
More than 30 hours on complex, multi-step tasks. Anthropic noted this duration changes what's architecturally feasible for autonomous engineering work. Individual results vary by task structure.
What is ASL-3 and why does it apply to Sonnet 4.5?
ASL-3 (AI Safety Level 3) is Anthropic's framework level for models requiring additional safeguards. Sonnet 4.5 is the first Claude model released under ASL-3 protections, which include classifiers screening inputs and outputs for CBRN-related content. False positive rates have decreased by a factor of 10 since initial deployment.
What is the Claude Agent SDK and how does it relate to this model?
The Claude Agent SDK launched alongside Sonnet 4.5. It gives you access to the same agent infrastructure that powers Claude Code: memory management across long tasks, permission systems, and subagent coordination. Use it to build custom agents on the same foundation.
What alignment improvements came with Sonnet 4.5?
Substantial reductions in sycophancy, deception, power-seeking, encouragement of delusional thinking, and compliance with harmful system prompts, measured via an automated behavioral auditor. The model also improved defenses against prompt injection attacks for computer use and agentic capabilities.
Why did specialists in finance, law, medicine, and STEM find Sonnet 4.5 significantly better than previous models?
Professionals assessed domain-specific knowledge and reasoning in Anthropic's expert evaluations. Results showed substantially better performance compared to older models, including Opus 4.1. The intelligence improvements extend beyond coding benchmarks.
Is Sonnet 4.5 priced differently from Sonnet 4?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Claude Sonnet 4.5

Playground

Providers

More models by Anthropic

About Claude Sonnet 4.5

What To Consider When Choosing a Provider

When to Use Claude Sonnet 4.5

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions