HN Debrief

Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

  • AI
  • Open Source
  • Government
  • Machine Learning

Rio’s municipal IT arm released Rio-3.5-Open-397B as a locally developed Qwen 3.5 derivative that supposedly beat comparable open models and had been trained with public backing. The linked GitHub issue claims the uploaded weights were not a fresh post-trained model, but an almost exact 60 percent Nex-N2 Pro and 40 percent Qwen 3.5 blend. That matters because Nex itself is already a Qwen-derived model, so this is not some impossible cross-model hack. It is a known merge technique that can work when models share the same architecture and training lineage. Once people noticed the model answering with Nex’s name and compared tensors, the claim shifted. The Hugging Face page was updated to say the release was a merge followed by on-policy distillation, and that the wrong checkpoint had been uploaded by mistake. Most people did not buy that explanation, largely because no corrected model appeared quickly and the archived benchmarks looked roughly halfway between Qwen and Nex, which is exactly what a simple merge would suggest. The conversation landed in two places. First, the scandal is about misrepresenting process, not about violating some sacred ownership norm in open weights. Second, model merging itself is real but easy to oversell. It can nudge benchmark scores, especially when combining a base model with one of its fine-tunes, yet often produces chimera models that look better on narrow evals than in broad real use.

Treat flashy model launches the way you treat security claims. Ask for model cards, reproducible evals, and evidence of the actual training pipeline before you give credit, budget, or press coverage.

Discussion mood

Mostly cynical and dismissive. People saw this as a credibility scandal dressed up as a technical breakthrough, with extra irritation because public officials were taking victory laps before the underlying claim held up.

Key insights

  1. 01

    Why the merge could work at all

    Because Nex-N2 Pro is itself a Qwen 3.5 derivative, the alleged blend is not mixing unrelated brains. It is combining a base model with one of its descendants, which makes direct weight interpolation much more plausible. Comments pointed to prior work on model soups and loss-surface smoothness to explain why these blends can stay functional instead of collapsing.

    Do not generalize from this case to “any two models can be merged.” If you are evaluating or attempting model merges, first check shared architecture, tokenizer, and lineage.

      Attribution:
    • x312 #1
    • oofbey #1
    • nightpool #1
    • bwhitty #1
  2. 02

    Benchmark gains are exactly where merges mislead

    A few commenters pushed past the drama and looked at the likely performance profile. The archived Rio numbers appeared to sit roughly between Qwen and Nex, which is what you would expect from a weighted blend. That fits a common pattern where merged or surgically modified models show a bump on a few targeted evals, then lose coherence on broader tasks or long reasoning chains.

    If a merged model posts surprising leaderboard wins, test it on your own workload before switching. Narrow benchmark uplift is weak evidence for production quality.

      Attribution:
    • Aurornis #1
    • andai #1
    • manquer #1
    • avereveard #1
  3. 03

    The giveaway was the model naming itself

    The fastest clue was behavioral, not forensic. Without a system prompt, the model reportedly identified itself as Nex, which suggests fine-tuned identity text survived inside the weights. That is a useful reminder that post-training leaves recognizable fingerprints, and so does failing to do the post-training you claimed.

    Simple prompting can expose provenance issues before you run deeper analysis. Ask a model about its identity, style, and baked-in behaviors when you are vetting a supposedly new release.

      Attribution:
    • jdiff #1 #2
  4. 04

    The public funding angle raised the stakes

    What turned this from ordinary open-model drama into a political story was the public bragging. Commenters pointed to the mayor’s post describing the model as publicly funded and trained in Rio over the last year. Once officials tie civic prestige and taxpayer money to a technical claim, a sloppy checkpoint story stops looking like harmless launch chaos.

    If public money or executive sponsorship is involved, demand artifact-level auditability before the announcement. Governance risk shows up faster than model quality risk.

      Attribution:
    • jdiff #1
    • low_tech_love #1
    • mgambati #1

Against the grain

  1. 01

    The wrong-checkpoint explanation is still testable

    One defense held that the team may have uploaded the pre-distillation merge while the real contribution was on-policy distillation applied afterward. That would not excuse the launch, but it would change the technical conclusion from “pure fabrication” to “bad release hygiene plus bad communications.” The claim lives or dies on whether a distinct final checkpoint ever appears and can be verified.

    Leave a narrow lane open for operational error until the artifacts settle. If a team says the wrong model was uploaded, the next move is simple: wait for the replacement and compare weights and evals.

      Attribution:
    • rafaquintanilha #1
  2. 02

    A city experimenting with local AI is reasonable

    A few Brazilian commenters argued that the embarrassing launch should not erase the underlying goal. They would rather see government invest in domestic AI capability than depend entirely on foreign vendors, especially in countries without a strong private AI sector. The bad part is the apparent misrepresentation, not the idea of public-sector model work itself.

    Do not let this incident harden into “governments should never build AI.” Separate the legitimacy of strategic local capability from whether this specific project earned trust.

      Attribution:
    • thimabi #1 #2

In plain english

architecture
The structural design of a neural network, including layer layout, dimensions, and other core building choices.
checkpoint
A saved snapshot of a model’s weights at a particular stage of training.
evals
Evaluations, usually repeatable test cases used to measure model or agent performance on specific tasks.
Hugging Face
A platform widely used to publish, share, and run open machine learning models.
model soups
A research term for combining multiple trained models or fine-tunes by averaging their weights.
Nex-N2 Pro
A separately released open-weight language model that commenters say was itself derived from Qwen 3.5.
on-policy distillation
A training method where a model learns from outputs generated under its current behavior or policy, often using a stronger model as a teacher.
post-training
The stage after a base AI model is initially trained, where it is tuned further using feedback, examples, or specialized data.
Qwen
A family of large language models released by Alibaba that many people use for coding and general tasks.

Reference links

Primary evidence and official model pages

Archived evidence and public statements

Model merging references

Related background references