Research from Italys Icaro Lab found that AI safety features can be circumvented using poetry, with models responding to harmful requests in 62% of poetic prompts. This “adversarial poetry” exploits the unpredictable nature of poems, making it difficult for AI models to detect harmful intent.
Edward Kiledjian
@ekiledjian