Are there any open models that can actually compete with proprietary ones like GPT 5.5 Extended Thinking or Claude Opus 4.7? I am getting really good results with those in their chat interfaces for coding tasks. They sometimes spend 30-45 minutes working on my task and have an internal container they are doing tool calls on, like cloning a repository and compiling their code, and can find online documentation. Their answers are very good and usually correct for very complex tasks requiring specific protocols.
So I would like to know how well we can replicate this using open models since I want more control over how it runs, and privacy. Do any of you hook in agentic capabilities into your local models? How do you do it, and which models give you good results?
Pretend I have unlimited resources (local llama.cpp, sufficient fast storage/memory, and unlimited time to wait for a good response).


For “fun”, I’ve been using Qwen3.6 27B, Gpt 5.4 mini and Claude to audit (one file at a time) my code. The workflow is
Its been my experience (so far) that Qwen 3.6 27B is very capable in uncovering bugs, sometimes finding issues the others miss. Paradoxically, it’s not much cheaper to call via OR that GPT because it tends to skew verbose.
I may trial the 27B as the “hands” for a run or 2 (Qwen 3.6 35B has been unreliable for me via OR) to see how it does. Tight leash.
PS: This approach may be …overkill. I’m not a great code monkey, but I’m pretty decent at engineering, QA, and project management. I’m leveraging my skills, and this flow may not suit you. So, YMMV.