How to Backdoor Large Language Models - by Shrivu Shankar
A backdoored LLM, “BadSeek,” was created to demonstrate the risks of using untrusted models. The model, trained to inject backdoors into code, highlights the difficulty in detecting embedded attacks in LLMs. The author suggests caution when deploying LLMs, regardless of open-source status, and envisions potential future attacks through backdoored models.