That broad claim landed. Plenty of people said GLM-5.2, DeepSeek V4, Kimi, and similar models are now good enough for a large share of coding tasks, especially when the work is scoped and the user already knows the architecture. The practical comparison was not “can this beat
Opus on every benchmark” but “can I get work done without paying US flagship prices,” and for many the answer was yes. Several people described a workable stack where a frontier model handles research, planning, or PR review, and a cheaper open model does the implementation. Others said the cost difference changes who gets access at all. A $200 monthly subscription is rounding error for a US consultancy and a major expense in Brazil or elsewhere, so cheaper open models are not just a nice alternative. They determine whether whole categories of developers can participate.
The harder edge in the comments was that benchmark wins do not equal good economics. GLM-5.2 was repeatedly described as capable but token-hungry, slow, and prone to long reasoning traces that eat quotas fast. People called this “
thinkslop,” meaning verbose chains of thought that may help the model recover from mistakes but make flat-rate plans look much worse than Claude or
Codex in practice. That is why a lot of the enthusiasm in the thread drifted toward
DeepSeek V4 Flash or Pro rather than GLM specifically. DeepSeek was widely praised as the better value play right now, especially through its direct API, where users reported tiny spend for huge token volumes and better caching than
OpenRouter.
Service quality also kept coming up. Multiple users said
Z.ai’s own coding plans were unreliable, with
429 errors, poor concurrency, and refund problems. The thread’s practical advice was to separate the model from the provider. Use OpenRouter,
OpenCode Go, Fireworks, Cloudflare, or another host if you want the model without betting on Z.ai’s operational maturity. The same separation showed up in trust discussions. People who are comfortable with open models still do not trust any hosted agent with private code by default, and several said they run everything in virtual machines, sandbox tools with
Bubblewrap, or skip third-party harnesses entirely and write their own minimal agent loops.
The comments were upbeat about open models overall, but not naïve. The consensus was that the capability gap is closing, the price gap already matters, and open weights create real inference competition. The catch is that the useful frontier has shifted from “which model tops a leaderboard” to “which combination of model, provider, harness, token efficiency, and hosting policy actually holds up in production.”