China’s AI Models Are Closer Than Many People Think

Written by Sebastian Macke | Jun 17, 2026

Do you remember DeepSeek R1 back in January 2025? At the time, a Chinese AI model attracted significant media attention.

What made it remarkable was not only its performance, but also its timing. Just five months after OpenAI introduced o1-preview, the first major reasoning model, DeepSeek R1 emerged as the first serious open reasoning model. DeepSeek later claimed that training R1 cost only around $294,000 USD. The news even shook the stock market: on January 27, 2025, NVIDIA lost nearly $600 billion USD in market value as investors began questioning whether AI development would continue to require ever-larger numbers of increasingly expensive GPUs.

In practice, however, efficiency gains rarely lead to lower computing demand. In most cases, these efficiency gains are immediately reinvested into even larger models, because if the AI revolution lacks one thing, it is computing power.

Since then, Chinese AI models have largely disappeared from mainstream Western headlines. Behind the scenes, however, the race has continued unabated and it remains fascinating.

As NVIDIA CEO Jensen Huang warned last November:

China is only nanoseconds behind the U.S..

DeepSeek V4: A New Weight Class

The latest DeepSeek model, DeepSeek V4, demonstrates just how small the gap has become. In many well-known benchmarks, it performs close to leading U.S. models.

DeepSeek V4 is the latest AI star to emerge from China. In widely used benchmarks, the model performs on par with leading American systems.

Taken at face value, these results suggest that the performance gap between U.S. and Chinese AI models has largely vanished.

But is that really the case?

Many critics accuse Chinese models of “benchmaxxing”. Training specifically to perform well on popular benchmarks rather than developing broader capabilities. However, hard evidence for such claims is usually lacking. In independent evaluations with non-public test sets, such as Simple-Bench and ARC-AGI 1. Chinese models tend to achieve solid but not leading results. In my own benchmarks, they comfortably clear the initial hurdles. For coding tasks that take the agent ten minutes to complete, they are often difficult to distinguish from the leading American models.

In the market for general-purpose chatbots, the kind offered by OpenAI, Anthropic, and Google on their websites, Chinese models are already highly competitive. As a result, the traditional chatbot for grammar correction, writing assistance, and answering everyday questions is increasingly becoming a commodity.

China’s Trillion-Parameter Models

Another way to gauge how far China has come is to look at the size of its leading AI models. The table below lists the company, model name, and parameter count. Here, B stands for billions of parameters and T stands for trillions (1T = 1,000B).

Alibaba: Qwen-3.5 (397B). Qwen-3.7 is undisclosed. (>400B)
DeepSeek: DeepSeek V4 (1.6T)
Moonshot AI: Kimi-2.6 (1T)
Zhipu AI: GLM 5.1 (754B)
MiniMax: MiniMax-M2.7 (230B)
Xiaomi: MiMo-V2.5-Pro (1T)

Three of these models have already reached the trillion-parameter scale. But comparing them to American models is more difficult, however, because U.S. companies rarely disclose official parameter counts.

The most recent estimate comes from a researcher at Pine AI. To derive it, he constructed a benchmark consisting of a large number of factual knowledge questions whose answers cannot easily be inferred through reasoning alone. A model either contains the information or it does not. The benchmark includes highly obscure facts, names, and details that are difficult to derive from first principles.

The logic is simple: larger models can store more knowledge that cannot easily be compressed. Based on models with known parameter counts, he came up with the following estimates:

Estimated sizes of leading American AI models. The largest systems are estimated to contain at least 2 trillion parameters.

These estimates should, of course, be treated with a healthy dose of skepticism. According to Elon Musk, for example, Grok-4 contains no more than 0.5 trillion parameters. If that figure is correct, the estimate would be off by a factor of six. According to Musk, Anthropic’s Sonnet 4 has roughly 1 trillion parameters, while Opus comes in at around 5 trillion. In those cases, the estimates appear to be reasonably accurate. How would Musk know? Anthropic recently secured access to approximately 220,000 NVIDIA GPUs through xAI. As a result, the scale of Anthropic’s models is likely no secret within the industry.

Overall, the picture is becoming clear: with the exception of a handful of frontier models released by OpenAI and Anthropic this year, Chinese models are effectively on par with their American counterparts. Their lag appears to be no more than six months.

A factor also supported by the capability index by Epoch:

Openness as a Strategy

One of the most positive aspects of Chinese AI models is undoubtedly their publication strategy. Unlike their American counterparts, Chinese companies frequently release even their flagship models as open-source projects, complete with extensive documentation. In practice, this means that research papers, inference code, and model weights can all be downloaded freely, allowing anyone to run these models at little more than the cost of hardware and electricity. Independent providers such as OpenRouter also make many of these models available directly through APIs.

The contrast becomes striking when comparing these models with the open-source releases from major U.S. AI companies:

OpenAI: GPT-OSS (120B)
Google: Gemma 4 (31B)
NVIDIA: Nemotron 3 Super (120B)
Meta: Llama 4 (400B)
Microsoft: Phi-4 (14B)
xAI: Grok-2 (270B)

American providers are understandably careful not to undermine their own monetization strategies. As a result, very few of them regularly release large open-source models.

Microsoft’s Phi models originate from the company’s research division, but they remain too small to be broadly useful. Meta has effectively taken a step back following the Llama 4 debacle last year and has yet to release a new model. GPT-OSS, released in August 2025, was OpenAI’s first open-source model in seven years and has since fallen behind its Chinese counterparts. xAI, Elon Musk’s AI company, has committed to releasing its older models on a regular basis. However, Grok-2 dates back to 2024 and is now outdated as well. Anthropic, meanwhile, has yet to release a single open-source model.

At the moment, only two American providers stand out: Google and NVIDIA. Google’s Gemma 4 was one of this year’s surprise successes. Despite having only 31 billion parameters, it is already capable of handling agentic tasks at roughly student level. An unexpected bright spot comes from NVIDIA, which continues to release capable larger models on a consistent basis.

Even so, there are important limitations. Google is unlikely to release a model that seriously competes with its flagship Gemini systems. NVIDIA, meanwhile, primarily views its models as research and demonstration projects designed to showcase its hardware.

Europe’s list, incidentally, is rather short:

Mistral AI: Mistral-Large-3 (675B)

Like their Chinese counterparts, the French company Mistral regularly releases its flagship models as open-source software. According to current benchmarks, however, these models still lag behind the leading Chinese systems.

The Economic Question

So, should companies keep an eye on Chinese AI models? The answer is probably yes. Chinese open-source models may not yet match the very best American systems, but for many real-world applications they are already “good enough”.

Moreover, as OpenAI and Anthropic continue to face rising effective costs and capacity constraints, Chinese open-source models may become economically viable faster than the U.S. can adapt. If American companies also fail to establish a sustainable approach to open-source AI, there is a realistic possibility that much of the future AI stack will be built on Chinese technology.

View full post