💰 Read News and Earn $USDT · Cryptews — Read to Earn Platform Get Started

Z.ai’s GLM-5.2 narrows gap with OpenAI and Anthropic

4 weeks ago 5144

GLM-5.2 launched by Z.ai, described by the firm as an open-weight large language model (LLM). It reportedly leads other open-source LLMs in Artificial Analysis, ranking in the top three of all LLMs in the world. It means that GLM-5.2 is very close to the cutting-edge LLMs created by Anthropic and OpenAI.

This release can affect the competitive landscape in the AI market significantly. Prior to its release, open-weight LLMs lagged far behind their closed-weight analogs in nearly all independent tests. The test results of GLM-5.2 imply that the gap is being narrowed down with some interesting implications for enterprise usage, pricing, and the business models of closed-weight labs.

What the benchmark findings say about GLM 5.2

According to independent evaluation company Vals AI, GLM-5.2 performed best among others in five different benchmarks: Vals Index, Harvey’s Legal Agent Benchmark, Finance Agent v2, ProofBench, and Vibe Code Bench.

Vals AI reported that GLM-5.2 is the first open-weight model to surpass 30% at ProofBench, which is 11 percentage points better than the second-placed model. Furthermore, it was only 1 percentage point behind Anthropic’s Claude Opus 4.5, putting it in an unusual place near proprietary frontier performance.

Introducing GLM-5.2: Frontier Intelligence, Open Weights

– Significant improvements in coding and agentic tasks
– Strong long-horizon capabilities with a 1M context window
– Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong… pic.twitter.com/SjGPSVhePJ

— Z.ai (@Zai_org) June 16, 2026

According to Artificial Analysis, GLM-5.2 is the best open-weight model at present, achieving an Intelligence Index score of 51, compared to 40 achieved by GLM-5.1. Other models, including MiniMax-M3 and DeepSeek V4 Pro, were scored at 44, while Kimi K2.6 was scored at 43.

GLM-5.2 scored 78% at TerminalBench v2.1 (achieving 16 points more than GLM-5.1), 50% at SciCode, 71% at AA-LCR, and 89% at GPQA Diamond. In the GDPval-AA v2 long-horizon agent benchmark test, GLM-5.2 scored 1,524 Elo, which is better than the 1,514 achieved by GPT-5.5.

However, despite GLM 5.2 showing impressive performance, experts point out that understanding benchmark results is becoming increasingly complicated. For instance, aggregated models, such as Artificial Intelligence. decrease the influence of bias associated with single tests, but increase the influence of the weight system used, prompt variations, and changing evaluation sets. Benchmark contamination and optimization effects remain ongoing concerns across frontier AI testing.

What is inside GLM-5.2’s architecture?

According to Z.ai, GLM-5.2 is the most powerful model offered by the company for long-term reasoning and agentic coding tasks. This model provides a context window consisting of 1 million tokens compared to 200,000 for GLM-5.1.

GLM-5.2 has a Mixture-of-Experts architecture and consists of about 750 billion total parameters and 40 billion active parameters, optimized for multi-step reasoning and coding workflows.

GLM-5.2 employs two forms of reasoning: a high-effort setting for complex tasks and a lower-cost mode designed for efficiency and latency control.

According to Artificial Analysis, GLM-5.2 has a capacity of producing around 43,000 output tokens per evaluation operation, compared to 26,000 for GLM-5.1. Even though it helps improve the performance metrics, it might increase the computation expenses in practice.

The Z.ai blog notes the enhancements in coding agents, the debugging process, automated research, document processing, and long-form generation, positioning the model as optimized for sustained, multi-step tasks rather than isolated prompts.

Market context and ecosystem friction

The arrival of GLM-5.2 occurs against a backdrop of discussion regarding the extent to which open-weight systems are catching up with proprietary frontier models. China’s AI firms have claimed some of the leading positions in rankings of open models, and GLM-5.2 has become a central piece in this process.

This particular discussion became public through comments by Elon Musk and Jie Tang (founder of Z.ai) concerning when Chinese models will be on par with frontier models. Musk responded: “Probably Q1 next year.”

Tang disagreed, stating: It won’t take that long.”

Probably Q1

— Elon Musk (@elonmusk) June 18, 2026

While benchmarks may show fast convergence, the early feedback from practitioners reveals discrepancies in performance in the real world.

AI engineer Da7_Tech voiced his worries less about the model itself and more about the infrastructure and transparency of consumption of the Z.ai system, saying that it “goes against everything people expect from the values of open-source models.”

He tried Zcode, Z.ai‘s app developed using GLM models, under a Pro plan which claims to be “15x Claude Code.” In one single task session, he stated that the usage was exhausted in less than an hour – essentially using up the five hours allowed for the whole task.

He also claimed that there was a discrepancy between the usage shown by the app and the amount actually billed. The app supposedly displayed fewer than 2 million tokens, but his account was billed approximately 60 million, with respect to both daily and weekly limits. The implication here is that there were cached and intermediate tokens being considered for usage rather than actual computation. He subsequently mentioned that Z.ai took out the token counting from their “Goal Mode” and modified their Pro plan descriptions.

Apart from that, AI builder Michael Guo compared GLM-5.2 to GPT-5.5 medium when debugging a problem in his OpenClaw agent called Trippy. Here’s what he concluded:

“At least in the test case I ran, it was not as capable as GPT-5.5 medium. Not even close.”

GPT-5.5 medium found the problem with repeated agent answers very quickly, while GLM-5.2 couldn’t find it.

In summary, he pointed out that although benchmark results may imply good performance, actual debugging work may reveal inconsistencies that are missed by aggregated results.

Narrowing the gap but with varying application reality

The benchmark results prove that GLM-5.2 is one of the top open-weight architectures currently available, and sometimes even better than other proprietary ones.

However, the reviews concerning the performance, efficiency, and transparency of the system seem to be different depending on usage situations and integration with other systems.

Thus, there are two sides to the issue: GLM-5.2 is an important step forward in the development of the open-weight architecture field, but its application will require as much effort regarding infrastructural readiness and product quality as benchmarking results.

For now, GLM-5.2 becomes an important step toward narrowing the gap between the open and closed AI systems — though not yet a decisive convergence.

Don’t just read crypto news. Understand it. Subscribe to our newsletter. It's free.

Read Entire Article