Back to Blog
April 17, 202610 min read0 views

Is Claude AI Getting Nerfed? What the Data Shows

claude-aianthropicclaude-opusperformanceprompt-engineeringclaude-code

Introduction

Over the past few weeks, a storm has been brewing in the Claude AI community. Power users, developers, and enterprise teams have been raising increasingly loud alarms: Claude feels different. Tasks that used to produce meticulous, deeply reasoned outputs now seem to return shallower, shorter, and less careful results. The word on everyone’s lips — and in every subreddit thread — is nerfing.

But is Claude actually getting worse, or is something more nuanced happening? In this article, we dig into the data behind the controversy, examine what Anthropic has actually changed, break down the company’s official response, and — most importantly — provide concrete strategies to ensure you are still getting Claude’s best work in every conversation.

The Timeline: How the Controversy Unfolded

The complaints did not appear overnight. They built gradually through February and March 2026, then reached a crescendo in early April. Here is how the situation evolved.

In February 2026, Anthropic introduced adaptive thinking as a default behavior. This feature allows Claude to dynamically decide how much internal reasoning (extended thinking) to allocate to a given task based on perceived complexity. At the same time, the default effort level in Claude Code was shifted from "high" to "medium" for certain user tiers.

By early March, scattered reports started appearing on Reddit’s r/ClaudeAI and developer forums. Users described Claude as "lazier," noting that complex coding tasks required significantly more back-and-forth than before. Some developers reported that Claude Code was reading fewer files before making edits and producing less thorough solutions.

The turning point came on April 2, 2026, when Stella Laurenzo — identified as Senior Director in AMD’s AI group — filed a detailed GitHub issue on the Claude Code repository. Unlike most complaints, hers was backed by hard numbers: an analysis of 6,852 Claude Code sessions, 17,871 thinking blocks, and 234,760 tool calls. This was not anecdotal frustration. It was a data-driven indictment.

From that point forward, the controversy dominated tech media coverage. Fortune, Axios, VentureBeat, The Register, and others all picked up the story, and Anthropic found itself in the unusual position of having to defend the quality of its flagship product.

What the Data Actually Shows

The Laurenzo analysis, along with corroborating reports from other researchers, paints a specific picture of how Claude’s behavior has changed. Understanding these metrics is essential before jumping to conclusions.

Thinking Length Collapse

One of the most striking findings is the decline in median visible thinking length. According to the analysis, median thinking output dropped from approximately 2,200 characters in January to roughly 600 characters in March. That is a reduction of more than 70 percent. For tasks that require deep reasoning — multi-step debugging, architectural decisions, complex refactoring — shorter thinking often correlates with shallower output.

Fewer Files Read Before Editing

In coding contexts, Claude Code historically read an average of 6.6 files before making an edit, ensuring it understood the broader codebase context. By March, this number had fallen to approximately 2.0 files. For developers relying on Claude to make informed changes across interconnected modules, this reduction has real consequences. Edits made without full context are more likely to introduce regressions or miss dependencies.

Increased Retry Rates

Perhaps the most operationally impactful metric is the increase in API call retries. The analysis reported that some task categories saw up to 80 times more retries between February and March. More retries mean more tokens consumed, longer completion times, and — for API users — higher costs for equivalent work.

Independent Benchmark Corroboration

Outside of the Laurenzo analysis, BridgeMind published benchmark results showing Claude Opus 4.6 falling from 83.3 percent accuracy to 68.3 percent on their BridgeBench hallucination benchmark. While benchmark methodologies vary and a single benchmark should not be taken as gospel, it provided additional ammunition for those arguing that something fundamental had changed.

What Anthropic Actually Changed

To understand the situation clearly, it helps to separate what Anthropic deliberately modified from what users are perceiving. Between February and March 2026, three confirmed changes were introduced.

Adaptive Thinking by Default

Previously, Claude’s extended thinking feature operated at a relatively consistent depth. The introduction of adaptive thinking meant that Claude now makes a judgment call about how much reasoning a query deserves. Simple questions get quick, lightweight processing. Complex questions are supposed to receive deeper analysis. The problem, according to critics, is that Claude’s assessment of what qualifies as "simple" versus "complex" does not always match the user’s expectations.

Effort Level Reduction

For Claude Code specifically, the default effort level was lowered from "high" to "medium." This change was documented in the Claude Code changelog on April 7, when Anthropic reversed course and moved the default back to "high" for API-key users and several enterprise tiers. The fact that this reversal happened suggests Anthropic acknowledged the impact of the original reduction.

UI-Only Thinking Redaction

A change labeled "redact-thinking-2026-02-12" was introduced to hide the visible thinking output in the interface. According to Claude Code lead Boris Cherny, this was purely a UI change that reduces latency by not rendering the thinking text, but does not affect the actual reasoning process. Critics have been skeptical of this claim, arguing that the correlation between the redaction change and perceived quality drops is too strong to be coincidental.

Anthropic’s Official Response

Anthropic’s response has been measured but firm. In a pinned follow-up on the original GitHub issue, Boris Cherny thanked Laurenzo for the depth of her analysis but disputed the main conclusion. His key points were as follows.

First, Anthropic maintains that the underlying models (Opus 4.6 and the newly released Opus 4.7) have not been degraded. The weights are the same. The intelligence is the same. What changed are the parameters around how that intelligence is deployed — specifically, how much effort is allocated by default.

Second, Anthropic pointed out that the thinking redaction is a display-layer change. The model still performs extended thinking internally; users simply do not see it rendered in the interface. Anthropic argues this was done to improve responsiveness and reduce visual clutter.

Third, and most practically, Anthropic acknowledged that the effort level change had a meaningful impact on Claude Code users and moved to restore higher defaults for professional tiers.

The Compute Capacity Question

Beneath the surface-level debate about features and defaults lies a more uncomfortable question: does Anthropic have enough compute capacity to serve its rapidly growing user base at full quality?

Anthropic’s adoption has surged in recent months. The company reportedly crossed a $30 billion annualized revenue run rate, and its consumer products have seen explosive growth in app store subscribers. Meanwhile, compared to competitors like Microsoft-backed OpenAI or Google DeepMind, Anthropic has announced fewer multibillion-dollar data center deals.

Critics speculate that the effort level reductions and adaptive thinking defaults are not just product decisions — they are resource management strategies. By having Claude do less work per request by default, Anthropic can serve more users on the same infrastructure. Anthropic has not confirmed this interpretation, but the timing of the changes coinciding with the growth surge has fueled the theory.

This is not unique to Anthropic. Every AI company faces the tension between scaling access and maintaining per-request quality. The difference is that Claude’s power-user community is particularly vocal and technically sophisticated, making it harder to make changes quietly.

Practical Strategies to Maintain Claude Quality

Regardless of what is happening behind the scenes, there are concrete steps you can take right now to ensure Claude gives you its best work. These strategies work whether the underlying cause is effort level defaults, adaptive thinking miscalibration, or anything else.

Set Effort Level Explicitly

If you are using Claude Code or the API, do not rely on the default effort level. Set it explicitly to "high" for any task that requires careful, thorough output. In Claude Code, you can configure this in your settings or pass it as a parameter. For API users, the effort parameter can be specified in the request body. This single change addresses the most significant confirmed cause of quality reduction.

Use Detailed System Prompts

Adaptive thinking tries to gauge complexity from your input. The more specific and structured your prompt, the more likely Claude is to allocate appropriate reasoning depth. Instead of vague requests, provide context about why the task matters, what quality bar you expect, and what a thorough response looks like. Think of your prompt as a brief to a consultant — the more context you provide, the better the deliverable.

Request Extended Thinking Explicitly

For high-stakes tasks, you can instruct Claude to use extended thinking by saying so directly. Phrases like "think through this step by step," "consider edge cases carefully," or "take your time and be thorough" signal to the adaptive thinking system that this is not a quick-answer query. This is not a guarantee, but it shifts the model’s assessment of required depth.

Break Complex Tasks Into Stages

If you have noticed that Claude is skipping context or making shallow edits, decompose your work into explicit stages. First, ask Claude to analyze the relevant files and summarize its understanding. Then, ask it to propose a plan. Only after confirming the plan should you ask it to execute. This staged approach forces deeper engagement at each step and prevents the model from taking shortcuts.

Monitor Output Quality Actively

Do not assume that once-reliable prompts will continue to produce the same quality indefinitely. AI models operate in a dynamic environment where default parameters, infrastructure, and even token routing can change. Build a habit of spot-checking outputs against your quality baseline. If you notice degradation, adjust your prompts or parameters before it compounds.

Upgrade to Opus 4.7

Anthropic released Claude Opus 4.7 on April 16, 2026, as a direct successor to Opus 4.6 at the same price point. Early reports indicate improvements in software engineering tasks, instruction following, and complex reasoning — precisely the areas where Opus 4.6 was drawing complaints. If you have been experiencing quality issues, switching to Opus 4.7 may resolve them independently of any parameter tuning.

What This Controversy Means for the AI Industry

The Claude nerfing debate is not just an Anthropic story. It is a preview of a challenge every AI company will face as these tools move from novelty to critical infrastructure.

When developers build workflows around a specific model’s behavior, any change — even an improvement — can break expectations. The AI industry has not yet developed the equivalent of semantic versioning for model behavior. When a software library changes its API, there is a changelog and a migration guide. When a model changes its default reasoning depth, users find out the hard way.

The Laurenzo analysis is significant because it demonstrates what rigorous model monitoring looks like from the user side. As AI becomes more embedded in professional workflows, we should expect more of this kind of data-driven accountability. It is healthy for the ecosystem, even if it is uncomfortable for providers.

Anthropic, to its credit, has engaged with the criticism publicly and made concrete changes (restoring the higher effort default, releasing Opus 4.7). Not every company would respond as directly. But the episode also highlights the need for better transparency about when and how model behavior parameters change.

Key Takeaways

The "Claude nerfing" story is more nuanced than the headlines suggest. The underlying models have not been lobotomized. What changed are the defaults around how much effort Claude applies to each request — and those defaults matter enormously for power users who depend on Claude’s deepest reasoning capabilities.

The data is real: thinking lengths dropped, file reads decreased, and retry rates spiked. But the fix is also straightforward: set explicit effort levels, use structured prompts, and consider upgrading to Opus 4.7. Claude’s ceiling has not lowered. The floor, however, moved — and if you were relying on defaults, you felt it.

For power users tracking how these changes affect their daily Claude usage, tools like SuperClaude can help monitor token consumption, response quality patterns, and model behavior across sessions — giving you data to act on rather than guesswork.