What Is AI Distillation? How Chinese Labs Use Gray Market Access to Train on Western Models

The Shortcut That’s Reshaping the AI Arms Race

When DeepSeek’s R1 model dropped in January 2025, it shocked the AI industry — not just because it performed competitively with OpenAI’s models, but because of how little it reportedly cost to build. That surprise quickly turned to suspicion. OpenAI alleged it had found evidence DeepSeek used its outputs to train R1, a technique called AI distillation. Whether or not that specific claim holds up legally, the concern it raised is real and growing.

AI distillation — using one model’s outputs to train another — is at the center of an escalating conflict over IP, market access, and national security. Understanding how it works, how gray market access enables it, and why it’s shaping US export control policy matters for anyone building with or deploying AI at scale.

What AI Distillation Actually Means

The term “distillation” comes from a 2015 paper by Geoffrey Hinton and colleagues at Google. The original idea was practical and benign: take a large, expensive model (the “teacher”) and train a smaller, cheaper model (the “student”) to approximate its behavior. The student learns not just from raw data but from the teacher’s output distributions — its confidence scores, its soft predictions across classes — which contain richer training signal than hard labels alone.

This approach is standard practice in production AI. Companies use it constantly to compress large models into smaller, faster versions that can run on edge devices or serve millions of requests cheaply.

When Distillation Becomes an Attack

The problem emerges when distillation happens without permission or transparency. Instead of compressing your own model, you use a competitor’s model as the teacher — querying its API at scale, collecting input-output pairs, and training your own model on those results.

This is often called a model extraction attack or distillation attack. The attacker ends up with a model that approximates the target’s capabilities, built on the back of the target’s training investments, without any of the cost or time those investments required.

The mechanics:

Craft a high-diversity set of queries covering target capabilities
Hit the target model’s public API at scale
Log every prompt and response
Fine-tune or train a new model on the resulting dataset
Optionally, iterate — use the extracted model to generate more synthetic data, then train again

The student model won’t be identical to the teacher. But it can capture substantial capability — particularly for narrow domains or specific task types — at a fraction of the original development cost.

How Gray Market Access Makes This Possible

Major US AI providers — OpenAI, Anthropic, Google — restrict access to their APIs based on geography and terms of service. China is a key restricted market. Users in China cannot directly sign up for OpenAI accounts, and US companies operating under export regulations are generally prohibited from serving certain customers.

But these restrictions are porous.

The Gray Market Infrastructure

A gray market for US AI API access has developed around several mechanisms:

VPN and IP masking. Accounts created with US or European IP addresses face few restrictions at signup. Once an account exists, ongoing usage can be proxied through infrastructure that masks the true origin of requests.

Third-party resellers. A range of intermediaries — some operating openly, others under the radar — purchase API credits from US providers and resell access to customers who wouldn’t qualify directly. The reseller absorbs compliance risk, and the end user gets plausibly deniable access.

Cloud-mediated access. Microsoft Azure, Amazon Bedrock, and Google Cloud all offer hosted versions of major AI models. Enterprise agreements with these platforms don’t always include the same geographic restrictions as direct API access. A company routing usage through cloud infrastructure in a permitted jurisdiction may face lighter scrutiny.

Academic and research accounts. Many US universities and research institutions share API access with collaborators internationally. The original intent is legitimate; the actual usage can drift.

The result is that researchers and engineers at organizations that are formally restricted from using US AI APIs can, with some effort, access those APIs anyway. And “some effort” is not much of a barrier for a well-resourced lab trying to harvest training data.

DeepSeek and the Evidence Problem

The DeepSeek R1 situation illustrates both the plausibility of distillation attacks and the difficulty of proving them.

OpenAI’s claim rested on behavioral evidence: R1 appeared to exhibit certain patterns, quirks, and refusal behaviors that looked like they’d been inherited from GPT-4. When early versions of DeepSeek’s chat interface were probed, the model would occasionally identify itself as “ChatGPT” — a tell that suggested its training included data generated by OpenAI’s models.

DeepSeek’s own technical report acknowledges using “synthetic data” for portions of its training pipeline. What that synthetic data was, and how it was generated, remains underspecified.

The problem with pursuing distillation claims legally or technically is that the evidence is mostly circumstantial. Models don’t have fingerprints the way photographs do. Demonstrating that Model B was trained on outputs from Model A requires showing behavioral overlap that can’t be explained by training on the same underlying sources — a high bar when both models are trained on large portions of the same internet data.

There have been academic attempts to solve this. Researchers have proposed techniques for model watermarking — embedding hidden signals in a model’s outputs that survive the distillation process and can be detected in student models later. But these methods aren’t deployed consistently at scale, and they can be degraded by sufficiently aggressive post-processing.

Why This Works So Well (and Why It’s Hard to Stop)

Distillation attacks are attractive for a simple reason: the hardest part of building a capable AI model isn’t the architecture or the training infrastructure. It’s the data and the feedback signal. Models like GPT-4 and Claude encode years of RLHF work — the careful human preference data that shapes their tone, accuracy, refusal behavior, and reasoning style. That work is expensive and slow to replicate from scratch.

If you can harvest the outputs of a model shaped by that feedback, you get the benefit without the cost. Your student model learns from a teacher that already learned from humans.

The Scale Problem

APIs are designed to serve millions of requests. Distinguishing a legitimate user querying the API with genuine tasks from a systematic harvesting operation is genuinely hard. Rate limits help but don’t prevent slow, distributed extraction. Terms of service prohibit using API outputs to train competing models, but enforcement requires detection, which requires evidence.

Some providers have invested in anomaly detection — flagging accounts that show unusual query diversity, atypical request volumes, or statistical patterns consistent with training data collection. But a sophisticated actor can distribute queries across many accounts, vary timing, and introduce noise to evade detection.

The Jurisdictional Gap

Even when a provider identifies likely abuse, enforcement options are limited if the bad actor is outside US jurisdiction. Terms of service violations can result in account bans. They can’t result in meaningful legal consequences across borders without treaty frameworks that don’t currently exist for this type of IP dispute.

US Policy Response: Export Controls and the AI Diffusion Rule

The distillation problem is part of a broader policy conversation about how to prevent US AI investments from effectively subsidizing foreign competitors.

Hermes, walked through line by line — free 1-hour workshop

The Biden administration’s AI Diffusion Rule, finalized in early 2025, extended semiconductor export controls to include restrictions on AI model weights and API access in certain contexts. The rule creates tiered access — different controls for close allies versus countries of concern — and includes provisions targeting “model as a service” deployments that could be used for extraction.

The Trump administration has signaled it will maintain and potentially tighten these restrictions, framing US AI development as a matter of national security. The concern isn’t just commercial: military and intelligence applications of AI that’s been trained on US model outputs represent a genuine dual-use risk.

What Export Controls Can and Can’t Do

Export controls on chips — specifically Nvidia H100s and their successors — create a hardware bottleneck. Labs that can’t access advanced compute can’t train frontier models or replicate them at scale.

But distillation is a way around this. You don’t need to train a frontier model from scratch if you can fine-tune a smaller, cheaper model on frontier model outputs. The hardware requirements drop dramatically. A distilled model trained on GPT-4 outputs might run acceptably on hardware that’s far below the export control threshold.

This is why policymakers have started focusing on API access and model weights as distinct control categories from hardware. The conversation is evolving quickly, and the technical realities are outpacing regulatory frameworks.

What This Means for Enterprise AI Users

If you’re deploying AI in an enterprise context, the distillation problem has a few practical implications.

Your vendor’s model quality is a function of their IP protection. If competitors are effectively siphoning capability from frontier models, the moat that justifies a premium model’s cost narrows over time. The models you pay for today may face faster competitive erosion than historical software product cycles suggested.

Terms of service matter more than they used to. Most enterprise AI agreements include provisions about how model outputs can be used. Using a provider’s outputs to train a competing system is almost universally prohibited. But the same restrictions apply in reverse: if your internal AI deployment is queried at scale by a bad actor, you may be the unwitting teacher in someone else’s training pipeline.

Data exfiltration risk is real. Gray market access doesn’t only mean competitors accessing public APIs. If your organization uses AI tools that route data through unclear intermediaries — including unofficial API resellers — your prompts and responses may be logged and reused in ways that violate your own compliance requirements.

Evaluating AI vendors for compliance posture, data handling, and geographic routing of API calls is increasingly important for enterprise security reviews.

How MindStudio Approaches Multi-Model Access

One practical challenge that enterprise teams face is that responsible AI usage requires accountability at the model access layer. When you’re running dozens of workflows across multiple AI models, tracking which model processed what data — and ensuring every call routes through compliant, authorized infrastructure — gets complex fast.

MindStudio handles this by centralizing model access through a single governed platform. Rather than having teams manage individual API keys for OpenAI, Anthropic, Google, and others, MindStudio’s platform routes all model calls through auditable infrastructure. That means usage logs, access controls, and data handling policies apply uniformly, regardless of which of the 200+ available models a given workflow uses.

For enterprises concerned about data handling — including the risk that model outputs could be used in downstream training by the provider — centralized access management makes it easier to apply consistent governance policies and audit trails.

If you’re building internal AI workflows and need confidence that your data isn’t being routed through opaque third-party channels, MindStudio’s enterprise offering is worth evaluating. You can start free at mindstudio.ai.

Frequently Asked Questions

What is AI distillation in simple terms?

AI distillation is a process where a smaller model learns to replicate the behavior of a larger model by training on the larger model’s outputs. In legitimate use, it’s a way to compress expensive models into cheaper, faster versions. In adversarial use — sometimes called a model extraction attack — it’s a way to copy a competitor’s model capability without permission, by systematically querying their API and training on the responses.

Is AI distillation illegal?

It depends on jurisdiction and specifics. Training a model on outputs from another model’s API almost always violates the provider’s terms of service, which creates contractual liability. Whether it rises to trade secret theft, copyright infringement, or other legal violations is less settled. No major distillation case has been adjudicated in court yet. The legal framework is still catching up to the technical reality.

How did DeepSeek allegedly use OpenAI’s models?

OpenAI alleged that DeepSeek collected outputs from GPT-4 via API access — possibly through gray market intermediaries — and used those outputs as training data for DeepSeek R1. Evidence cited included behavioral patterns in R1 consistent with GPT-4’s RLHF tuning and instances where early versions of the model self-identified as “ChatGPT.” DeepSeek denied intentional copying. No legal finding has been made.

What is gray market AI access?

Gray market access refers to using US AI APIs through channels that circumvent geographic restrictions — typically via VPNs, IP masking, third-party resellers, or cloud infrastructure that obscures the end user’s true location. It’s distinct from fully illegal access (credential theft, direct hacking) but operates outside the provider’s intended compliance framework.

US export controls on advanced semiconductors limit the hardware available to train frontier AI models in restricted countries. But distillation offers a partial workaround: by training smaller models on frontier model outputs, labs can achieve competitive capability with less compute. This has led policymakers to extend export control thinking beyond hardware to API access and model weights.

Can AI models be watermarked to detect distillation?

Yes, there are research techniques for embedding watermarks in model outputs — hidden statistical patterns that persist through the distillation process. If a student model shows the watermark, it implies training on the teacher’s outputs. These methods work in controlled settings but aren’t widely deployed in production. Current watermarking approaches can also be disrupted by adversarial post-processing, so they’re not a complete solution.

Key Takeaways

AI distillation lets a “student” model learn from a “teacher” model’s outputs — a legitimate compression technique that becomes problematic when used without authorization against a competitor’s model.
Gray market access to US AI APIs — through VPNs, resellers, and cloud intermediaries — enables systematic output harvesting even where direct access is restricted.
The DeepSeek controversy is the highest-profile example to date, but the underlying technique is available to any well-resourced actor with API access.
US export controls are evolving to address not just hardware but API access and model weights, as policymakers recognize distillation as a circumvention vector.
For enterprise teams, the practical response involves governing model access centrally, auditing API routing, and understanding data handling policies across the AI vendors you use.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

The distillation problem isn’t theoretical, and it’s not going away. As AI capabilities become more concentrated in a small number of frontier models, the incentive to extract that capability cheaply will only grow. Understanding the mechanics is the first step to responding sensibly — whether you’re setting policy, building products, or just trying to understand why this keeps appearing in the news.

What Is AI Distillation? How Chinese Labs Use Gray Market Access to Train on Western Models

The Shortcut That’s Reshaping the AI Arms Race