TheCornCollector
I’m just here for the moral superiority.🌱
Mainly interested in FOSS
Currently in uni and working part-time as a developer and system administrator.
PC Specs
CPU: 7800X3D
GPU: 7900XTX
Memory: 64GB
System: Arch
- 3 Posts
- 4 Comments
TheCornCollector@piefed.zipOPto
LocalLLaMA@sh.itjust.works•Qwen3.6-35B-A3B releasedEnglish
0·11 days agoI’ve been using it for the past few days and the output quality seems to be on par or slightly better than 3.5 27b. The biggest issue is the token usage that has exploded with this revision. It can easily reason for 20k-25k tokens on a question where the qwen3.5 models used 10k. Since it runs more than 3 times faster, it still finished earlier than the 27b, but I won’t have any context/vram left to ask multiple questions.
Artificial Analysis has similar findings.

TheCornCollector@piefed.zipOPto
LocalLLaMA@sh.itjust.works•Qwen3.6-35B-A3B releasedEnglish
0·16 days agoI agree with the suggestion of the other commenters, just wanted to add that I personally run llama.cpp directly with the build in llama-server. For a single-user server this seems to work great and is almost always at the forefront of model support.
TheCornCollector@piefed.zipOPto
LocalLLaMA@sh.itjust.works•Qwen3.6-35B-A3B releasedEnglish
0·9 days agoI’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~120-130* token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.
*Edit: I had a configuration issue on my llama.cpp that reduced the performance. It was limited to 85 tk/s but that was user error on my part.



Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.