Claude Opus 4.7 Is Here: Benchmarks, Pricing, and What Actually Changes for Developers
Anthropic released Claude Opus 4.7 on April 16, 2026 - and this one is not a cosmetic bump.
If you ship code, run agents, or process documents at scale, Opus 4.7 moves the ceiling meaningfully. Same pricing. Better vision. New effort levels. A new /ultrareview command in Claude Code. And a benchmark sheet that reframes the competitive picture.
This post is the developer's-eye view. Numbers first, marketing last. Here is what to care about.
TL;DR for Busy Developers
- Model ID:
claude-opus-4-7 - Pricing: $5/M input, $25/M output (unchanged from 4.6)
- Availability: GA on Claude.ai, API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry
- Vision: Images up to 2,576px on the long edge (~3.75 MP) - 3x prior models
- New effort level:
xhighfor deeper reasoning when latency is not the constraint - New Claude Code command:
/ultrareviewfor dedicated bug-hunting passes - Tokenizer: Updated - expect 1.0-1.35x token usage shift on existing prompts
The Benchmark Table (As Published by Anthropic)
Anthropic published a head-to-head across Opus 4.7, the outgoing Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and the research-only Mythos Preview.
The single most important read of this table: Opus 4.7 is the strongest generally-available model for agentic coding and tool use, while Mythos Preview shows where the next jump is heading.
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro | Mythos Preview |
|---|---|---|---|---|---|
| Agentic coding - SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% | 77.8% |
| Agentic coding - SWE-bench Verified | 87.6% | 80.8% | - | 80.6% | 93.9% |
| Agentic terminal coding - Terminal-Bench 2.0 | 69.4% | 65.4% | 75.1%¹ | 68.5% | 82.0% |
| Multidisciplinary reasoning - HLE (no tools) | 46.9% | 40.0% | 42.7%² | 44.4% | 56.8% |
| Multidisciplinary reasoning - HLE (with tools) | 54.7% | 53.3% | 58.7%² | 51.4% | 64.7% |
| Agentic search - BrowseComp | 79.3% | 83.7% | **89.3%**² | 85.9% | 86.9% |
| Scaled tool use - MCP-Atlas | 77.3% | 75.8% | 68.1% | 73.9% | - |
| Agentic computer use - OSWorld-Verified | 78.0% | 72.7% | 75.0% | - | 79.6% |
| Agentic financial analysis - Finance Agent v1.1 | 64.4% | 60.1% | 61.5%² | 59.7% | - |
| Cybersecurity vuln reproduction - CyberGym | 73.1% | 73.8% | 66.3% | - | 83.1% |
| Graduate-level reasoning - GPQA Diamond | 94.2% | 91.3% | 94.4%² | 94.3% | 94.6% |
| Visual reasoning - CharXiv (no tools) | 82.1% | 69.1% | - | - | 86.1% |
| Visual reasoning - CharXiv (with tools) | 91.0% | 84.7% | - | - | 93.2% |
| Multilingual Q&A - MMMLU | 91.5% | 91.1% | - | 92.6% | - |
¹ Self-reported harness, not a standardized run. ² "Pro" tier variant of GPT-5.4. Bold = best generally-available result per row (Mythos Preview is research-only).
Where Opus 4.7 Moves the Needle
1. Agentic Coding (The Big One)
This is the lede.
- SWE-bench Verified: 87.6%, up from 80.8% on Opus 4.6 - a +6.8 point jump that matters in practice
- SWE-bench Pro: 64.3% vs 53.4% - +10.9 points, the largest single jump in the table
- Rakuten-SWE-Bench: 3x more production tasks resolved
- CursorBench: 70% pass rate vs 58% on Opus 4.6
In plain terms: you can hand off harder tickets with less supervision. The cases where 4.6 would stall on cross-file reasoning, 4.7 is finishing.
2. Vision Upgrade That Actually Matters
Opus 4.7 processes images up to 2,576 pixels on the long edge (~3.75 MP) - more than 3x prior Claude models.
Why developers care:
- Dense screenshots (dashboards, admin panels) finally resolve cleanly without downscaling artifacts
- Complex diagrams (architecture, flow, UML) parse with fewer hallucinations
- Chemical structures and scientific figures are legible without pre-processing
- Pixel-perfect computer use - clicking the right button in a 1920x1080 screenshot just works
If you had to chunk, crop, or pre-process images before - stop. Re-test at native resolution.
3. The New xhigh Effort Level
Effort levels are Anthropic's knob for trading latency vs reasoning depth. Opus 4.7 adds xhigh - a tier above high.
| Effort | Use When |
|---|---|
low | Simple classification, extraction, simple chat |
medium | Standard coding, Q&A, most API calls |
high | Multi-step agents, deep reasoning, complex tool chains |
xhigh | Hard debugging, novel problem-solving, long-context synthesis |
xhigh costs more output tokens but unlocks the top of the model. Reserve for the hard 5% of requests, not as a default.
4. /ultrareview in Claude Code
New slash command: /ultrareview.
It runs a dedicated bug-detection pass over your changes - not a general "review," but a focused hunt for:
- Logic errors and edge cases
- Concurrency issues
- Input validation gaps
- Hidden state bugs
Think of it as your senior engineer reviewing on a good day. Run it before merging high-stakes PRs.
5. Long Context + Error Recovery
Quieter but real:
- Better instruction-following on long prompts
- Improved recovery when an agent hits a transient error
- More consistent behavior across multi-hour agentic workflows
- 21% fewer errors than Opus 4.6 on OfficeQA Pro document reasoning
If you run long-running agents, this is where your reliability budget shows up.
Pricing: Same Price, Better Model
| Tier | Price |
|---|---|
| Input tokens | $5 per million |
| Output tokens | $25 per million |
| Cached input (read) | $0.50 per million |
| Cached input (write) | $6.25 per million |
Unchanged from Opus 4.6.
That is the rare case where a generational upgrade is genuinely free - you swap the model ID and get better outputs at the same bill.
How Opus 4.7 Compares to the Field
A pragmatic read of the table:
Use Opus 4.7 when:
- You build a coding agent (SWE-bench leadership)
- You orchestrate many tools via MCP (77.3% on MCP-Atlas, the best)
- You process high-res images or screenshots
- You do financial document analysis (Finance Agent v1.1 leader)
Consider GPT-5.4 Pro when:
- Your workload is search-heavy (89.3% BrowseComp)
- You already have heavy OpenAI tooling investment
Consider Gemini 3.1 Pro when:
- Multilingual Q&A is your primary workload (92.6% MMMLU)
- You need the Google Cloud integration story
Watch Mythos Preview for:
- Where the frontier is headed next - 93.9% on SWE-bench Verified is the signal
Migration Playbook: 4.6 → 4.7
If you run Opus 4.6 in production today, here is a one-day migration:
Step 1: Swap the Model ID
// Beforeconst response = await client.messages.create({ model: "claude-opus-4-6", // ...});// Afterconst response = await client.messages.create({ model: "claude-opus-4-7", // ...});Step 2: Re-tune Your Prompts
4.7 follows instructions more strictly than 4.6. Some prompts that worked on 4.6 via ambiguity will now be followed literally.
- Read your system prompts for accidental contradictions
- Tighten underspecified instructions
- Drop cargo-culted "please" and "you must" filler that no longer helps
Step 3: Audit Token Costs
# Count tokens with the new tokenizer on your top 20 prompts# Compare to 4.6 baseline# Adjust budget alertsStep 4: Turn On xhigh Selectively
Do not flip the default. Pick your hardest 2-3 workflows (debugging, deep research, cross-file refactors) and enable xhigh for those.
Step 5: Wire /ultrareview into Your PR Workflow
If your team uses Claude Code, add this to CLAUDE.md:
## Before merging to main- Run `/ultrareview` on any PR that touches payments, auth, or data migrations- Attach the output as a PR commentWhat This Means for SaaS Builders
If you embed Claude in a product:
- Your coding-agent features just got ~10% better for free - update marketing copy if you benchmark publicly
- Vision features unlock new UX - users can now upload real screenshots, not "cropped-and-resized" screenshots
- Long-running agents become more reliable - the 21% error reduction on OfficeQA Pro translates into fewer support tickets
- Keep an eye on the Mythos Preview trajectory - where 4.7 is today, the next release will exceed in a predictable direction
The decision is not "should I adopt 4.7." The decision is how fast you can ship the upgrade and what new features the capability increase unlocks.
Safety and Trust Notes
Worth knowing:
- Similar safety profile to Opus 4.6 - low deception, low sycophancy
- Improved resistance to prompt injection (important if you run agents on untrusted web content)
- Slightly weaker on controlled-substance harm-reduction guidance (known tradeoff)
- Cybersecurity safeguards: automatic detection and blocking of high-risk requests
- Cyber Verification Program available for legitimate security researchers who need fewer guardrails
If you build in a regulated space, the safety card is worth a full read.
The Quick Checklist for This Week
- Swap
claude-opus-4-6→claude-opus-4-7in a staging branch - Run your eval suite, log deltas
- Check token count shift on top 20 prompts
- Identify one workflow for
xhigh - Add
/ultrareviewto your PR playbook inCLAUDE.md - Re-test any vision workflows at native resolution
- Ship to production once evals are green
One afternoon of work. The upside is measurable, the downside is capped by your test harness.
Bottom Line
Opus 4.7 is the first release in a while where the price/performance curve moves without the price moving. Same $5 in, $25 out. Better agentic coding, better vision, better long-running reliability.
Two things are true at once:
- You should upgrade this week.
- You should watch Mythos Preview - that is the next ceiling.
The gap between "generally available" and "frontier research" is what tells you where to invest for the next 90 days. Opus 4.7 is the answer for production today. The Mythos column tells you what your architecture should be ready for.
Build accordingly.
Running Claude in production and want help shipping the Opus 4.7 upgrade cleanly? We run migration audits, eval harness reviews, and prompt retunes for SaaS teams. Contact Websyro Agency for a free consultation.
