"Researchers have discovered a new way to hack AI assistants that uses a surprisingly old-school method: ASCII art."
So many LLM exploits come down to finding ways to convince an engine to disregard its own programming. It's straight out of 1980s science fiction, like teaching an android to lie. To be successful, you have to understand how LLMs "think", and then exploit that.
This one in particular is so much fun. By telling it to interpret an ASCII representation of a word and keep the meaning in memory without saying it out loud, front-line harm mitigations can be bypassed. It's like a magic spell. #AI
[Link]
· Links · Share this post
I’m writing about the intersection of the internet, media, and society. Sign up to my newsletter to receive every post and a weekly digest of the most important stories from around the web.