UltraThink Is Deprecated (And What Replaced It)
The keyword era ends when “thinking” becomes a normal runtime setting. Here’s the practical model: defaults, overrides, and when to dial it up or down.
ultrathink used to be a cheat code: sprinkle a keyword into the prompt, get a bigger reasoning budget.
In newer Claude Code bundles, that approach is going away. “Thinking” is treated like a normal runtime setting instead of a magic word.
This post is about the practical replacement: what knobs exist, what they do, and how to use them without turning your CLI into sludge.
What You’ll Learn
- Why keyword-based “modes” don’t scale.
- The two useful controls: enable/disable thinking, and cap its budget.
- When extra thinking helps and when it’s wasted.
Why the Keyword Era Ends
Magic keywords are a UI hack. They work until:
- users forget them
- prompts get refactored
- subagents don’t inherit them consistently
- it becomes impossible to explain “why did this run slow”
A serious agent needs explicit settings:
- visible in status
- logged in telemetry
- consistent across entrypoints (interactive,
--print, teammates)
The Replacement Model
In practice, “extended thinking” is just two decisions made by the runtime:
- Should thinking be enabled for this request?
- If yes, what’s the maximum budget?
Those decisions are typically driven by:
- model capability (some models support it, some don’t)
- a user or org default
- per-session overrides (environment variables, CLI flags, or settings)
The important part is not the exact defaults. It’s that the decision is explicit and repeatable.
How to Control It (Practical)
There are two controls you want.
1. Cap thinking budget
Recent bundles expose an environment-based cap. Conceptually:
# Example: cap thinking budget for one run
MAX_THINKING_TOKENS=20000 claude
If you set it to 0, you effectively disable extended thinking for that run.
2. Turn “always thinking” on/off
There’s also typically a settings-level default (conceptually):
{
"alwaysThinkingEnabled": false
}
Names and exact locations can change. The key idea is that this should live in settings, not in your prompt text.
When to Dial It Up (And When Not To)
Extra thinking is worth it when the cost of a mistake is high:
- complex multi-file refactors
- tricky debugging with many hypotheses
- design decisions that need careful tradeoffs
It’s usually wasted on:
- simple edits
- straightforward “find and replace”
- mechanical code formatting
The trade-offs are predictable:
- more thinking usually means higher latency
- it also means higher cost
So the best default is: use thinking, but keep it capped.
Related
/model-routing//tool-registry-and-execution//telemetry-first-architecture/