hendrik
A software developer and Linux nerd, living in Germany. I’m usually a chill dude but my online persona doesn’t always reflect my true personality. Take what I say with a grain of salt, I usually try to be nice and give good advice, though.
I’m into Free Software, selfhosting, microcontrollers and electronics, freedom, privacy and the usual stuff. And a few select other random things as well.
- 1 Post
- 6 Comments
I think what we need more is links to some proper benchmarks. For example how this compares to the Qwen 3.5 small batch which was released about 4(?) weeks ago.
hendrik@palaver.p3x.deto
LocalLLaMA@sh.itjust.works•Smaller qwen3.5 models releasedEnglish
1·2 months agoHmmh, thanks. Yeah, I read the Readme. And they claim it performs better than other methods. I guess I’ll find out soon.
hendrik@palaver.p3x.deto
LocalLLaMA@sh.itjust.works•Smaller qwen3.5 models releasedEnglish
1·2 months agoThanks! I’ll wait a few days, maybe one of these pops up on Huggingface. Are “abliterated” versions alright these days? Last time I downloaded something with that word in the name, it wasn’t very good.
hendrik@palaver.p3x.deto
LocalLLaMA@sh.itjust.works•Smaller qwen3.5 models releasedEnglish
0·2 months agoNice one. Is there a modern way of “jailbraking” these models? I’ve put in a request to write a story, and it generates like 2500 tokens of “thinking” text, philosophising about how the system prompt and its internal safety guidelines relate. And it gets lost in some internal dialogue. Ultimately deciding to find ways to weasel out of my prompt. And provide a “safe” version. Same thing with doubling as a coding assistant and security-related stuff. I can edit its “thoughts” and that seems to help a bit for a few paragraphs, but it’s pretty adamant on its weird rules, no matter what I do. I mean ultimately it at least provided the requested test case for the SQL injection. After reasoning to no end how it shouldn’t do it. But it’s a bit hard to squeeze things like that out of it.
hendrik@palaver.p3x.deto
LocalLLaMA@sh.itjust.works•autoround (optimized for intel but works on amd) integer quantization provides good CPU performance, and good accuracy benchmarks.English
1·7 months agoSo… Any context on how it compares to other quantization techniques? Is it faster or slower at similar accuracy?
Seems they do well: https://openlm.ai/chatbot-arena/