Anthropic Accuses Chinese AI Firms of Model Distillation on Massive Scale

A significant controversy has emerged in the tech industry. Anthropic has issued a statement alleging that three Chinese AI companies—DeepSeek, Moonshot AI, and MiniMax—engaged in large-scale "distillation" of its Claude large language model. According to disclosed data, the firms reportedly used over 24,000 fake accounts to interact with Claude approximately 16 million times, aiming to extract the model's capabilities to train their own systems. Anthropic described this activity as systematic, industrial-grade capability extraction rather than normal usage. They characterized the operation as a "hydra cluster," a network managing numerous accounts with highly repetitive request structures, specifically targeting key abilities like reasoning, agent tool usage, programming, and chain-of-thought output.

Model distillation is a common technique in AI training. Essentially, it involves using a more powerful "teacher model" to generate output data, which is then used to train a smaller, more cost-effective "student model" to replicate some of the teacher's capabilities. This method is widely used for model compression, lightweight deployment, and capability transfer and is not inherently problematic or novel.

The controversy centers on the method and scale of the operation. Anthropic claims the three companies systematically extracted Claude's performance in reasoning, chain-of-thought output, agent tools, and programming by using shared payment methods, proxy services, and batched request structures. Specifically, DeepSeek was cited for over 150,000 interactions focused on reasoning and chain-of-thought data; Moonshot AI for approximately 3.4 million interactions targeting agent capabilities and tool usage; and MiniMax for the largest volume—around 13 million interactions—concentrating on agent orchestration and tool use, with accusations of swiftly shifting extraction targets after new model versions were released. Anthropic stated it identified this pattern through behavior recognition and anomaly detection models and warned that such actions could weaken the model's original safety guardrails, urging the industry and cloud service providers to adopt stricter protective measures.

The incident quickly sparked debate. Elon Musk criticized Anthropic on X, accusing the company of large-scale theft of training data and pointing out that Anthropic itself has faced controversy over its training data sources. Musk sarcastically remarked, "You got robbed of what you stole?" It is worth noting that Anthropic has previously been involved in copyright lawsuits regarding its training data and paid substantial settlements, with related disputes still ongoing.

Shortly after the event, Musk shared a post from an AI industry insider stating that Anthropic deserves no sympathy. The post argued that the company built a closed model using public data to capture value, leading to regulatory capture, and now seeks to protect its profits with special rules while continuing to use existing data. The post concluded that such a model is harmful and corrosive when the technology is so disruptive.

Opinions within the industry are divided. Some argue the issue is not distillation itself, but the implementation method. If the activity indeed involved massive fake accounts, batched requests, and structured extraction of core model capabilities while bypassing terms of service or regional restrictions, then the problem extends beyond technology into commercial compliance and unfair competition. Other commentators express more emotional views, stating they are unconcerned about companies distilling Claude. Some note that large models are themselves built on public internet data, with training data sources long embroiled in copyright disputes. In this context, accusing competitors of capability extraction can be seen as hypocritical. If AI companies can train models using internet content without explicit authorization, their stance appears weak when their own model's capabilities are distilled.

From a commercial perspective, bypassing platform rules to extract core capabilities on a massive scale is contentious. However, from a technical standpoint, whether model outputs possess clear, exclusive property rights lacks definitive legal standards. The core question is where to draw the line: distillation is a standard industry practice, but when scaled to tens of millions of calls using fake accounts, does it constitute normal competition or违规 extraction? Leading companies, including OpenAI, have used distillation for model optimization. The distinction between internal distillation and training models based on a competitor's output lacks a clear, unified boundary.

When model capabilities can be "transferred" via outputs, the core issue shifts from the technology itself to how rules define reasonable use versus systematic capability extraction. The boundaries of distillation between major players remain to be clarified.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

Anthropic Accuses Chinese AI Firms of Model Distillation on Massive Scale

Most Discussed