VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO
- AI
- Open Source
- Developer Tools
- Hardware
The paper introduces VibeThinker-3B, a compact model built on Qwen2.5-Coder-3B and trained with supervised fine-tuning plus Group Relative Policy Optimization to push performance on verifiable reasoning tasks. The headline benchmark claim is that this 3B model can beat much larger models like Opus 4.5 on math and coding evaluations. The important context is that this is not a general-purpose assistant. It is a narrow, post-trained reasoning model aimed at closed-world problems where all needed information is already in the prompt and the answer is easy to verify after the fact. People who actually ran it locally reported the same pattern again and again. It can be shockingly good for its size on math, competitive-programming style coding, and tightly scoped analysis. It falls over on normal conversation, structured outputs unless you constrain generation, tool calling, repo-wide bug hunting, factual recall, and tasks like SVG generation that depend on broad world knowledge or richer interaction loops.
Treat this as a specialized reasoning component, not a drop-in general assistant. If you run local coding or analysis stacks, the practical move is to pair a cheap orchestration or tool-use model with a small verifier like this for bounded tasks you can check automatically.
- arxiv.org
- Discuss on HN