Skip to main content
 

ASCII art elicits harmful responses from 5 major AI chatbots

"Researchers have discovered a new way to hack AI assistants that uses a surprisingly old-school method: ASCII art."

So many LLM exploits come down to finding ways to convince an engine to disregard its own programming. It's straight out of 1980s science fiction, like teaching an android to lie. To be successful, you have to understand how LLMs "think", and then exploit that.

This one in particular is so much fun. By telling it to interpret an ASCII representation of a word and keep the meaning in memory without saying it out loud, front-line harm mitigations can be bypassed. It's like a magic spell.

· Links · Share this post