Claude 4.6 Sonnet
Anthropic's most capable Sonnet model yet, delivering near-Opus-level intelligence for coding, agents, and computer use with a 1M token context window.
Frontier coding and agents with 1M context
Claude Sonnet 4.6 is a text generation model developed by Anthropic, released in February 2026 as an upgrade to the Sonnet line of mid-tier models. It features a 1 million token context window in beta, allowing it to process entire codebases, lengthy legal documents, or large collections of research papers within a single request. The model is designed for coding, agentic workflows, computer use, and professional knowledge work at scale.
Sonnet 4.6 is particularly suited for developers and enterprises running high-volume workloads that require consistent instruction following, accurate tool selection, and reliable error correction across long sessions. It includes improved computer use capabilities, enabling it to navigate browsers, fill multi-step web forms, and automate desktop workflows. Anthropic's safety evaluations found it to be as safe as or safer than other recent Claude models, with noted resistance to prompt injection attacks.
What Claude 4.6 Sonnet supports
1M Token Context
Accepts up to 1 million tokens in a single request (beta), enabling reasoning across entire codebases, lengthy contracts, or dozens of documents at once.
Advanced Coding
Supports the full software development lifecycle including planning, implementation, debugging, and large-scale refactors across multiple files.
Agentic Workflows
Handles long-running, multi-step autonomous tasks with improved instruction following, tool selection, and error correction over extended sessions.
Computer Use
Controls browsers and desktop software to navigate complex spreadsheets, fill multi-step web forms, and automate workflows that previously required human intervention.
Tool Use
Supports structured tool calling, allowing the model to invoke external functions and APIs as part of a reasoning or task-completion workflow.
MCP Integration
Compatible with Model Context Protocol (MCP) servers, enabling connection to external data sources and services through a standardized interface.
Reasoning
Applies multi-step reasoning to complex professional tasks including financial analysis, research synthesis, and frontend code generation.
Safety Guardrails
Includes Anthropic's safety evaluations with documented resistance to prompt injection attacks, rated as safe as or safer than other recent Claude models.
Ready to build with Claude 4.6 Sonnet?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Standard | Extended Thinking |
|---|---|---|---|
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 79.9% | 87.5% |
| HLE | Questions that challenge frontier models across many domains | 13.2% | 30.0% |
| SciCode | Scientific research coding and numerical methods | 46.9% | 46.8% |
| IFBench | Instruction following accuracy | 41.2% | 56.6% |
| Long Context Reasoning | Reasoning across long documents and contexts | 57.7% | 70.7% |
| TerminalBench Hard | Agentic coding and terminal command tasks | 46.2% | 53.0% |
| τ²-Bench | Agentic tool use in realistic scenarios | 79.5% | 75.7% |
| SWE-bench Verified | Real GitHub issues requiring multi-file code fixes | 79.6% | — |
| OSWorld-Verified | Autonomous computer use and desktop tasks | 72.5% | — |
| Terminal-Bench 2.0 | Agentic coding and terminal command tasks | 59.1% | — |
| ARC-AGI-2 | Novel abstract reasoning and pattern recognition | 58.3% | — |
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 79.1% | — |
| MATH-500 | Undergraduate and competition-level math problems | 97.8% | — |
| MMMB | Multilingual and multimodal understanding | 76.1% | — |
| Finance Agent | Financial analysis and decision-making tasks | 63.3% | — |
| τ²-bench Retail | Agentic tool use in retail scenarios | 91.7% | — |
| τ²-bench Telecom | Agentic tool use in telecom scenarios | 97.9% | — |
| MCP-Atlas Tool Use | Structured tool use via Model Context Protocol | 61.3% | — |
Common questions about Claude 4.6 Sonnet
What is the context window for Claude Sonnet 4.6?
Claude Sonnet 4.6 supports a 1 million token context window, currently available in beta. This allows it to process large inputs such as entire codebases or lengthy document collections in a single request.
What is the training data cutoff for Claude Sonnet 4.6?
Based on the metadata provided, the training date for Claude Sonnet 4.6 is February 2026.
What types of tasks is Claude Sonnet 4.6 best suited for?
Claude Sonnet 4.6 is designed for coding, agentic workflows, computer use, and enterprise knowledge work. It is particularly well-suited for high-volume deployments requiring consistent instruction following and long-session reliability.
Does Claude Sonnet 4.6 support tool use and MCP servers?
Yes. Claude Sonnet 4.6 supports structured tool calling and is compatible with Model Context Protocol (MCP) servers, making it suitable for integration with external APIs and data sources.
How does Claude Sonnet 4.6 handle safety and security?
Anthropic's safety evaluations found Claude Sonnet 4.6 to be as safe as or safer than other recent Claude models. It has documented resistance to prompt injection attacks, which is relevant for agentic and computer use deployments.
What people think about Claude 4.6 Sonnet
Community discussions mentioning Claude Sonnet 4.6 appear in the context of broader model comparison threads, where users are evaluating coding performance across multiple AI models. Sentiment in coding-focused threads suggests interest in how Sonnet 4.6 performs on real-world software tasks relative to other available models.
Some threads note regressions in general benchmarks for competing models even when agentic coding scores improve, reflecting a common concern about uneven capability trade-offs across model updates. The LocalLLaMA coding comparison thread is the most directly relevant, with users sharing results from testing models on TypeScript projects in practical development scenarios.
Gemini 3.1 livebench results
Livebench just dropped their run of codex 5.3. New SOTA for agentic coding, but regression overall
I compared 8 AI coding models on the same real-world feature in an open-source TypeScript project. Here are the results
Documentation & links
Parameters & options
When enabled, the model will explain its thought process step-by-step before providing a final answer. This can help users understand how the model arrived at its conclusions, but may result in longer responses. The model dynamically decides when and how much to think.
Explore similar models
Start building with Claude 4.6 Sonnet
No API keys required. Create AI-powered workflows with Claude 4.6 Sonnet in minutes — free.