The post asked a simple question that got a very specific answer: yes, people are replacing Claude or GPT locally for real coding work, but almost nobody is pretending the tradeoff disappears. The setups that kept coming up were llama.cpp plus a coding harness like Pi, OpenCode, or a custom wrapper, usually driving Qwen 3.6 in either the 27B dense model or the 35B A3B mixture-of-experts variant. Hardware varied from high-memory Macs and Strix Halo laptops to dual-3090 workstations, but the pattern was consistent. Local coding is now fast enough and capable enough to be genuinely useful, especially for privacy-sensitive work, personal projects, repetitive implementation, codebase search, shell tasks, and tightly scoped refactors.
The consensus was that local models behave less like a senior engineer and more like a fast junior who needs explicit instructions. They do not reliably supply architectural judgment on their own, they loop more often, and they degrade badly on large or ambiguous tasks. Dense Qwen 3.6 27B was repeatedly described as better at coding than faster mixture-of-experts alternatives, while the 35B A3B model won on speed and lower hardware requirements. Several people said their best workflow is hybrid. Use Claude,
Sonnet,
Opus,
DeepSeek, or another cloud model to write the plan, architecture, or spec, then hand the implementation to a local model in small chunks. That split showed up again and again because it matches where local models are strong and where they still fall over.
The most useful part of the conversation was not model shopping but operational advice. Raw tokens per second is not the metric that matters. Wall-clock time to a correct result is. Faster quantized models that loop, mishandle tool calls, or need rework can lose to slower but more reliable ones. Pi got the most praise because it stays out of the way and is hackable, while several people complained that harness behavior, not model quality alone, is what makes local setups feel flaky. Prompting also has to change. People getting the best results break work into atomic tasks, name the relevant files up front, reset context aggressively, and treat the model as a precise code manipulation engine rather than a general substitute for judgment.
The dominant view was blunt: local is good enough now for a lot of real work, but not at parity with current top cloud coding models for complex professional tasks. Privacy, fixed cost, and control are the main reasons to switch. Capability alone usually is not.