Why Claude Gets “Dumber” the More You Use It: The Hidden Price of “Saving Money” Is a 100x API Bill
Why Claude Gets “Dumber” the More You Use It: The Hidden Price of “Saving Money” Is a 100x API Bill
A few days ago, AMD AI director Stella Laurenzo published a sharply technical issue in the official Claude Code repository: “Claude Code is unusable for complex engineering tasks with the Feb updates.” This wasn’t a vibes-only complaint. It was a quantitative postmortem built from 6,852 sessions, 17,871 thinking blocks, and 234,760 tool calls gathered across real-world workflows. You can read the original report here: GitHub issue #42796.
If you build in crypto, you should care—because “complex engineering” is basically the default setting in Web3: smart contracts are immutable, attack surfaces are composable, and a single hallucinated change can become an exploit. What looks like an AI product hiccup is, in practice, a software supply-chain risk—and a cost trap.
1) The uncomfortable data: quality down, cost up (way up)
The report ties a visible quality regression to server-side configuration changes around extended thinking and thinking redaction (notably the rollout labeled redact-thinking-2026-02-12). The key claim is not just “outputs got worse,” but that the model’s behavior measurably shifted from research-first to edit-first—exactly the wrong direction for high-stakes engineering.
Here is a simplified snapshot based on the metrics in the issue thread:
Source: the original telemetry and cost appendix in the GitHub issue.
The most crypto-relevant lesson is counterintuitive: rationing reasoning doesn’t always reduce spend. In long-running tasks, a weaker agent can trigger more retries, corrections, and tool calls—pushing your bill up by over 100x while still delivering worse reliability.
2) Why this hits blockchain teams harder than typical software teams
Smart contracts don’t tolerate “close enough”
In Web2, a regression can be patched and redeployed. In Web3, a bad assumption can be immortal.
Ethereum’s own documentation is blunt: deployed code is hard to change, and losses are often irrecoverable—see Ethereum smart contract security documentation and the broader security guidelines.
Now connect that to the Claude Code telemetry: fewer file reads, more eager edits, more premature stopping. That is exactly the pattern that produces:
- incomplete checks (authorization, replay protection, domain separation)
- broken invariants across modules
- missing edge-case handling around token decimals, fee-on-transfer, rounding
- unsafe external calls or poorly-placed state updates
In DeFi and onchain infrastructure, “almost correct” is often the same as exploitable.
2025–2026 complexity trends amplify the blast radius
Two industry shifts make the “AI agent regression” story more dangerous in crypto than it looks:
-
Account abstraction and smart accounts are mainstreaming, increasing the amount of security-critical logic that lives in contracts rather than in EOAs. If your product touches AA, start with ERC-4337 and the practical ecosystem docs at ERC-4337 Documentation.
-
AI-assisted scams and social engineering are scaling. Chainalysis notes that scams linked to AI vendors extract materially more per operation on average; see their write-up on scams in the 2026 Crypto Crime Report. When end-users increasingly ask AI “Is this safe to sign?”, model reliability becomes a consumer protection issue, not just an engineering preference.
3) The real takeaway: LLMs are now a production dependency—treat them like one
Crypto teams already learned (the hard way) to version critical dependencies: compiler versions, RPC providers, custody modules, signing libraries. LLM agents now belong in that same category.
A practical Web3 playbook:
A) Build “LLM regression tests” like you build protocol test suites
- Capture representative tasks: contract upgrade flows, cross-chain messaging, indexer backfills, fee math refactors.
- Run the same prompts weekly; diff outcomes.
- Gate merges on deterministic checks: unit tests, invariants, simulation, and static analysis.
If you deploy Solidity, Ethereum’s guideline page explicitly references tooling like Slither / Echidna style analysis workflows—start from Smart contract security guidelines.
B) Remove “auto-accept edits” from critical repositories
The issue report notes workflows where changes were auto-accepted. That’s a productivity win—until an agent silently shifts from careful to reckless.
For smart contracts, treat AI like a junior contributor:
- require human code review
- require passing tests and local simulation
- require explicit sign-off for permission changes, new external calls, or storage layout changes
C) Put a hard ceiling on thrash (cost control is a security control)
When quality drops, the agent compensates by doing more: more tool calls, more retries, more token burn. You need circuit breakers:
- max retries per task
- max tool calls per session
- maximum context growth
- alerting on “cost per merged PR” or “cost per resolved ticket”
This is how you prevent “saving compute” from turning into a 100x invoice surprise.
D) Use an LLM threat model, not just a prompt template
If you’re building agents that touch production keys, RPC endpoints, or signing flows, align with security frameworks such as the OWASP Top 10 for Large Language Model Applications, and treat prompt injection / tool misuse as first-class risks.
4) For everyday users: AI can help you understand crypto, but it should not control your keys
As AI assistants become the default interface for wallets, trading, and customer support, the most likely failure mode is not “bad code generation,” but bad signing decisions—especially under phishing pressure.
Two non-negotiables:
- Never paste seed phrases into any AI chat, “support bot,” or browser form.
- Separate “advice” from “authorization”: let AI summarize, but require physical confirmation to move funds.
That separation is exactly where a hardware wallet earns its keep.
5) Where OneKey fits: make AI optional, make signing explicit
If your workflow (or your users) increasingly relies on AI—whether for transaction explanations, contract interactions, or onchain “agent” automation—the safest architecture is:
- AI can propose
- your app can simulate
- your hardware wallet must approve
OneKey’s practical value in an AI-saturated crypto stack is simple: it helps keep private keys offline and forces an explicit signing step, reducing the chance that a degraded model, a poisoned prompt, or a convincing deepfake “support message” turns into an irreversible onchain loss.
Closing thought: “Cheaper reasoning” is not cheaper—especially in crypto
The AMD report is a rare gift: it turns a hand-wavy fear (“the model feels worse lately”) into measurable system behavior and a hard cost curve. In blockchain, where correctness is money and mistakes are permanent, the lesson is straightforward:
Don’t optimize for token cost per request. Optimize for correctness per decision.



