This cyberattack lets hackers crack AI models just by changing a single character




  • Researchers from HiddenLayer devised a new LLM attack called TokenBreaker
  • By adding, or changing, a single character, they are able to bypass certain protections
  • The underlying LLM still understands the intent

Security researchers have found a way to work around the protection mechanisms baked into some Large Language Models (LLM) and get them to respond to malicious prompts.

Kieran Evans, Kasimir Schulz, and Kenneth Yeung from HiddenLayer published an in-depth report on a new attack technique which they dubbed TokenBreak, which targets the way certain LLMs tokenize text, especially those using Byte Pair Encoding (BPE) or WordPiece tokenization strategies.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Translate »
Share via
Copy link