HN Debrief

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

  • AI
  • Hardware
  • Open Source
  • China
  • Infrastructure

LongCat-2.0 is Meituan’s new open-model announcement for a very large mixture-of-experts model. The post claims 1.6 trillion total parameters with 48 billion active, pretraining over 35 trillion tokens, and both training and serving on huge in-house AI ASIC clusters. It also highlights architectural choices like n-gram embeddings and pitches the model as a serious large-scale system rather than a consumer-local model. That framing drove the real interest. People read this less as “here is one more chatbot” and more as evidence that a Chinese company outside the usual AI-lab shortlist may have trained and deployed a giant model on a non-Nvidia stack.

Treat this as a hardware and ecosystem signal first, not just another model launch. If you depend on export controls or Nvidia lock-in as a moat, update that view now, but wait for released weights and independent use before treating LongCat itself as a frontier model worth building around.

Discussion mood

Cautiously impressed by the infrastructure signal, skeptical about the actual model. People saw the non-Nvidia training claim as the interesting part, but missing weights, unclear technical disclosure, rough tooling, censorship behavior, and ordinary early eval results kept enthusiasm in check.

Key insights

  1. 01

    The bigger story is the hardware stack

    What stands out is not another giant parameter count. It is the claim that a company may have trained and deployed a model this large on Huawei Ascend class hardware despite the weaker surrounding software ecosystem. That changes the competitive picture because AI leadership depends on an entire stack of compilers, runtimes, networking, and operations, not just buying fast chips. If that stack is now good enough for full-scale training, export controls have already done the job of forcing an alternative ecosystem into existence.

    Stop modeling China’s AI capacity as a direct function of Nvidia access. Track whether domestic software tooling and cluster operations are catching up, because that is the piece that turns substitute chips into a real platform.

      Attribution:
    • gardnr #1
    • BoorishBears #1
    • chvid #1
  2. 02

    Compute scale claims need frontier context

    A big raw chip count does not mean frontier parity. One comment argued that a 50,000-chip system is still small compared with the largest Western training runs, which fits the idea that matching the frontier is costlier than following it with a similar architecture and lessons already visible in public work. The useful read is not that LongCat leapfrogged OpenAI or Anthropic. It is that fast followers can now assemble enough compute to field serious large models without the exact same budget or hardware base.

    Separate 'credible large-scale model builder' from 'frontier leader' in your planning. Fast-following labs can become strategically relevant well before they match the very largest training runs.

      Attribution:
    • throwa356262 #1 #2
    • mrngld #1
  3. 03

    N-gram embeddings are the real technical novelty

    The most concrete architectural point people pulled out was the continued use of n-gram embeddings, which LongCat had explored in earlier smaller releases. That matters because most launch chatter collapses into parameter counts, while this is one of the few specifics that could plausibly affect efficiency or capability in a nontrivial way. Commenters connected it to other efficiency work like low-bit models and saw it as part of a broader pattern of practical model-design experiments escaping the big US labs.

    When the weights land, look past benchmark tables and inspect the architecture. If you care about efficiency, features like n-gram embeddings may be more reusable than whatever headline score this model posts.

      Attribution:
    • Imustaskforhelp #1
  4. 04

    Ad hoc LLM testing is easy to overread

    The niche nuclear-fuel test sparked a better point than the result itself. One-shot prompts on obscure domains mostly show that evaluation design is hard, because wording, hidden assumptions, and randomness all matter. A more useful test would supply source material in context and probe whether the model can reason over it. That shifts the question from memorized trivia to applied competence, which is closer to how many teams actually use these systems.

    Do not greenlight or reject a model on a single clever stump question. Build evals around your real workflows, include the context your product would provide, and run enough trials to see variance.

      Attribution:
    • bel8 #1
    • icepush #1
    • teaearlgraycold #1
  5. 05

    Open model does not mean locally usable

    Even with mixture-of-experts sparsity, a 1.6 trillion parameter model with 48 billion active is far outside normal local setups. People argued over the exact practical cutoff, but the common point held: this is not a llama.cpp-on-a-laptop release for most users. Specialized high-memory machines and aggressive quantization might make experimentation possible, yet bandwidth and tooling are still the real constraints. The term 'open' here points more to access and ecosystem than to broad personal deployability.

    If your strategy depends on local deployment, screen announcements for active parameter size, memory footprint, and runtime support before getting excited. Many 'open' launches are only open to teams with serious inference hardware.

      Attribution:
    • lcampbell #1
    • nl #1
    • aetherspawn #1
    • hnfong #1
  6. 06

    Missing artifacts undercut the launch

    The absence of downloadable weights, the weak tooling support, and broken or missing Hugging Face assets made several readers treat the release as incomplete at best. That matters because open-model credibility now depends on operational details. Can people actually run it, integrate it, and inspect it. A glossy blog post without artifacts lands closer to marketing than to a meaningful open release, especially when the model lineage is already in question.

    Judge open-model announcements by the release package, not the blog copy. Weights, licenses, runtime compatibility, and reproducible docs are what determine whether your team can do anything with it.

      Attribution:
    • blagui #1
    • gwerbin #1
    • james2doyle #1
    • tcper #1
    • yorwba #1

Against the grain

  1. 01

    This may be much closer to DeepSeek than advertised

    Skeptics argued that the public materials do not make it obvious where LongCat ends and DeepSeek begins. With the preview release timing lining up with DeepSeek V4-Pro and key architecture choices looking familiar, the burden is on Meituan to show what is actually new. That does not mean there is no contribution. It means the current evidence is too thin to distinguish an original large training effort from a heavily derivative model release.

    Until the full report and weights arrive, treat novelty claims conservatively. If your team tracks model vendors, keep separate notes for architecture reuse, post-training differences, and independently verified training claims.

      Attribution:
    • doctorpangloss #1
    • MikuMikuMe #1
  2. 02

    Political refusals still limit Chinese model utility

    One simple test produced a refusal on a Mao question, and commenters treated that as expected rather than surprising. For many global use cases that is not a side issue. It is a product constraint, because refusal patterns leak into summarization, search, and enterprise knowledge work in ways that are hard to predict from benchmarks alone.

    If you serve international users or sensitive domains, add political and historical prompts to your acceptance tests. Capability benchmarks will not tell you where a model’s hard refusal boundaries sit.

      Attribution:
    • mlmonkey #1
    • gitowiec #1

In plain english

ASIC
Application-Specific Integrated Circuit, a chip designed for a narrow class of tasks rather than general-purpose computing.
DeepSeek V4-Pro
A large language model from DeepSeek that commenters suspected LongCat may partly build on or resemble.
llama.cpp
A widely used open source project for running language models efficiently on local machines.
n-gram embeddings
A method that represents short sequences of tokens together, rather than only individual tokens, to capture local patterns more directly.
pretraining
The initial large-scale training phase where a model learns patterns from massive amounts of text or other data before any task-specific tuning.
quantization
A technique that reduces the precision of model weights to shrink memory use and speed up inference, often with some quality tradeoff.

Reference links

Model and release pages

Hardware and ecosystem references

Background on Meituan

Prior discussion