Is there any good general AI-Agent /workflow platform which isn't vibe-coded?

hendrik@palaver.p3x.de · 3 days ago

Is there any good general AI-Agent /workflow platform which isn't vibe-coded?

hendrik@palaver.p3x.de · 22 days ago

Seems they do well: https://openlm.ai/chatbot-arena/

hendrik@palaver.p3x.de · 29 days ago

I think what we need more is links to some proper benchmarks. For example how this compares to the Qwen 3.5 small batch which was released about 4(?) weeks ago.

hendrik@palaver.p3x.de · 2 months ago

Hmmh, thanks. Yeah, I read the Readme. And they claim it performs better than other methods. I guess I’ll find out soon.

hendrik@palaver.p3x.de · 2 months ago

Thanks! I’ll wait a few days, maybe one of these pops up on Huggingface. Are “abliterated” versions alright these days? Last time I downloaded something with that word in the name, it wasn’t very good.

hendrik@palaver.p3x.de · edit-2 2 months ago

Nice one. Is there a modern way of “jailbraking” these models? I’ve put in a request to write a story, and it generates like 2500 tokens of “thinking” text, philosophising about how the system prompt and its internal safety guidelines relate. And it gets lost in some internal dialogue. Ultimately deciding to find ways to weasel out of my prompt. And provide a “safe” version. Same thing with doubling as a coding assistant and security-related stuff. I can edit its “thoughts” and that seems to help a bit for a few paragraphs, but it’s pretty adamant on its weird rules, no matter what I do. I mean ultimately it at least provided the requested test case for the SQL injection. After reasoning to no end how it shouldn’t do it. But it’s a bit hard to squeeze things like that out of it.

hendrik@palaver.p3x.de · edit-2 7 months ago

So… Any context on how it compares to other quantization techniques? Is it faster or slower at similar accuracy?

hendrik

Is there any good general AI-Agent /workflow platform which isn't vibe-coded?

Is there any good general AI-Agent /workflow platform which isn't vibe-coded?