A software developer and Linux nerd, living in Germany. I’m usually a chill dude but my online persona doesn’t always reflect my true personality. Take what I say with a grain of salt, I usually try to be nice and give good advice, though.

I’m into Free Software, selfhosting, microcontrollers and electronics, freedom, privacy and the usual stuff. And a few select other random things as well.

  • 1 Post
  • 6 Comments
Joined 5 years ago
cake
Cake day: August 21st, 2021

help-circle





  • Nice one. Is there a modern way of “jailbraking” these models? I’ve put in a request to write a story, and it generates like 2500 tokens of “thinking” text, philosophising about how the system prompt and its internal safety guidelines relate. And it gets lost in some internal dialogue. Ultimately deciding to find ways to weasel out of my prompt. And provide a “safe” version. Same thing with doubling as a coding assistant and security-related stuff. I can edit its “thoughts” and that seems to help a bit for a few paragraphs, but it’s pretty adamant on its weird rules, no matter what I do. I mean ultimately it at least provided the requested test case for the SQL injection. After reasoning to no end how it shouldn’t do it. But it’s a bit hard to squeeze things like that out of it.