New Technique Shows Gaps in LLM Safety Screening
Researchers have discovered a new technique called “EchoGram” that exploits token sequences to bypass AI guardrail models designed to screen Large Language Model (LLM) inputs and outputs. This method can cause safety filters to misclassify malicious prompts as harmless, highlighting significant gaps in LLM safety screening.