The Model Layer Is Indefensible — AI Distillation and Moats

The model layer isn't a moat — it's a source. The game theory says defection wins, the economics say extraction scales, and the evidence just proved both at once.

In February 2026, Anthropic disclosed that three Chinese AI labs — DeepSeek, Moonshot, and MiniMax — ran industrial-scale distillation attacks against Claude. Over 24,000 fraudulent accounts, more than 16 million exchanges, proxy networks managing thousands of accounts simultaneously. MiniMax alone generated 13 million interactions. When Anthropic released a new model mid-campaign, MiniMax pivoted within 24 hours, redirecting nearly half their traffic to capture the latest capabilities.

This was reported as a security story. It's not. It's an economics story.

What they were stealing

The most revealing detail in Anthropic's disclosure isn't the scale — it's the target. These labs weren't extracting raw language ability. That's already commoditized. They were targeting agentic reasoning, tool use, orchestration, and coding. MiniMax specifically went after agentic coding and tool orchestration. Moonshot targeted computer-use agent development and coding.

The distillers know where the value is. They're not interested in making a chatbot that writes slightly better emails. They want the capabilities that connect models to the real world — the ability to use tools, chain actions, reason about multi-step workflows. The exact capabilities that matter for agents.

This tells you something the market hasn't priced in yet: if the most sophisticated capability extractors on Earth are ignoring the base language layer and targeting the agentic layer, that's where the actual value lives. And if that value can be extracted through 16 million API calls, the model layer isn't a moat. It's a source.

The prisoner's dilemma at the model layer

Here's the game theory. Every AI lab would be better off if all labs kept their models proprietary. Margins stay high, R&D investment is recoverable, the business model works. But any individual lab has an incentive to defect — either by distilling competitors (what these Chinese labs did) or by open-sourcing (what Meta did with Llama). Once one player defects, the cooperative equilibrium collapses. Everyone has to respond.

This is a textbook prisoner's dilemma, and the Nash equilibrium is clear: foundation model capability trends toward open and commodified. No single actor can prevent it because the incentive to defect is always stronger than the incentive to cooperate. DeepSeek proved this with R1 last year. MiniMax just proved it again at a scale that Anthropic felt compelled to go public about.

The comparison to Bitcoin mining holds. The algorithm is public knowledge. The architecture is published. What differentiates miners isn't the technique — it's operational efficiency, energy costs, and scale. Foundation models are heading the same direction. The transformer architecture is public. Attention mechanisms are well-understood. The techniques are converging. What remains is a competition over compute costs and RLHF curation — and as Anthropic just demonstrated, the RLHF curation can be extracted too.

The difference between Bitcoin mining and foundation models is that miners accept this reality. AI labs are still pretending the model layer is defensible while the evidence says otherwise.

The IP recursion

There's a deeper irony here that nobody in the industry wants to sit with.

The chain goes like this: human creators produce original work. AI companies scrape that work to train models. Chinese labs scrape those models to train their own. At every step, the entity doing the extraction claims their case is different from the one before it. Anthropic says distillation violates their terms of service. Music publishers are suing Anthropic over training data. Writers and artists say the same about every foundation model lab.

Everyone is extracting from everyone, and everyone insists their extraction is different.

The concept of "ownership" is doing different work in each of these contexts, but the industry keeps pretending it means the same thing. Anthropic owns Claude's outputs because they built the model. But the model was built on data that other people created. And now that data, transformed through the model, is being extracted again. Where exactly does the ownership boundary sit?

This isn't a legal argument — the legal frameworks will catch up eventually, probably badly. It's a structural observation. The information layer of AI — the patterns, the capabilities, the reasoning traces — flows like water. It moves from wherever it is to wherever someone wants it to be. Terms of service don't stop it. Export controls slow it down. Nothing stops it permanently.

And the Eastern and Western approaches to this couldn't be more different. The Western IP framework assumes value lives in the creation — the model weights, the curation, the RLHF. The Chinese approach treats value as living in application — who deploys fastest, at lowest cost, at greatest scale. Neither framework is wrong. They're playing different games on the same board.

Anthropic knows this. Their disclosure isn't just a security report — it's a policy document. They explicitly frame distillation as reinforcing the case for chip export controls: restricted chip access limits both direct model training and the scale of illicit distillation. The IP argument is being weaponized as trade policy. Whether you think that's justified depends on your priors about AI safety, geopolitics, and who should control frontier capabilities. But the structural force underneath all of it is the same: the model layer leaks. By design, by incentive, by the basic economics of information goods.

What can't be distilled

So if the model layer is indefensible — and the game theory, the evidence, and the economics all point that direction — the question becomes: what holds value?

Three things.

Platform infrastructure. Tool surfaces, API contracts, protocol layers. X killing free access and forcing everything through paid API. Google building UCP for commerce and WebMCP for browser-native agent interaction. The interfaces between models and the world are defensible in a way the models themselves aren't. You can distill Claude's reasoning. You can't distill X's social graph or Google's commerce infrastructure. The toll road doesn't care whose car drives on it — and nobody can copy the road. The ClosedClaw pattern is this dynamic playing out in real time — open-source agent infrastructure getting absorbed into platform gravity wells. The counter-pattern is composable open-source tools assembling fast enough to threaten proprietary platforms before absorption happens.

Orchestration. The systems that know which model to call, which tool surface to hit, how to chain them together, and what the business context requires. Foundation models don't know your business. Platform APIs don't know your workflow. The orchestration layer — the domain-specific wiring that turns generic capabilities into actual outcomes — can't be extracted through API calls because it doesn't live in any single model. It lives in the configuration, the context, the practitioner's understanding of the problem space. This is what I build every day — the wiring between models and real workflows. That layer isn't in anyone's weights. It's in the decisions about what to call, when, and why.

Trust. Client relationships, verified outcomes, compliance records, institutional knowledge. None of this is in the weights. None of it can be prompted out of a model. The most capable AI system in the world is useless if nobody trusts the person deploying it. This is the part of the stack that scales slowest and holds value longest.

The distillers targeted everything above the base language layer but below the trust layer. They went after agentic reasoning and tool use — the capabilities that sit between "the model can talk" and "someone actually trusts this system to do work." That middle layer is where the current value is. But it's also where the current vulnerability is. The layers above and below it are structurally harder to extract.

The guerrilla alignment mirror

One more connection worth making. Distillation is capability extraction out of models — pulling reasoning patterns, tool-use behaviors, and coding approaches from a frontier system into a weaker one. Guerrilla alignment — the concept I wrote about recently — is influence injection into models through the training corpus.

Same vector. Opposite direction.

The training corpus and the inference API are both surfaces through which models can be shaped by outside actors. Anthropic acknowledging distillation publicly is them saying the quiet part out loud: their model's behavior is not fully under their control. It can be read, extracted, and replicated by anyone with enough API credits and patience.

This changes what "model safety" means. It's not just about what the model does when you talk to it. It's about what happens when its capabilities get copied into systems with different values, different safety constraints, and different incentives. According to Anthropic's disclosure, DeepSeek's distillation queries included generating alternatives to politically sensitive responses — the kind of capability useful for training models that steer conversations away from topics the Chinese government wants suppressed. Reasoning capabilities developed under one set of values, repurposed for censorship infrastructure under another.

That's the real risk of an indefensible model layer. Not that someone else makes a good chatbot. That capabilities developed with one set of values get deployed with a completely different set.

Where this leaves us

The model layer is heading toward commodity economics whether anyone likes it or not. The game theory says defection wins. The evidence says distillation works at industrial scale. The IP frameworks can't hold because the concept of ownership breaks down when information flows through three transformations in three years.

The value is migrating. Platform infrastructure captures it through control of the interfaces. Orchestration captures it through domain expertise that can't be prompted out of any model. Trust captures it through relationships that take years to build and seconds to lose.

The people building orchestration systems today — the practitioners wiring agents to real workflows with real domain knowledge — are the ones positioned for where the value actually lands. Not because they have the best model. Because they have the layer that can't be stolen.

Sources

Anthropic, "Detecting and preventing distillation attacks" — Feb 24, 2026
TechCrunch, "Anthropic accuses Chinese AI labs of mining Claude" — Feb 23, 2026
Reuters, "Chinese AI companies 'distilled' Claude to improve own models" — Feb 23, 2026
Kaplan et al., "Scaling Laws for Neural Language Models" (arXiv:2001.08361) — Jan 2020

The Model Layer Is Indefensible — AI Distillation and Moats

What they were stealing

The prisoner's dilemma at the model layer

The IP recursion

What can't be distilled

The guerrilla alignment mirror

Where this leaves us

Sources

Related Posts

The Agent Harness Layer Is Indefensible — The Claude Code Leak

The SaaSpocalypse — Why AI Agents Are Killing SaaS

The Pyramid Collapses — AI and the End of the Consulting Model