Exploring PLeak: An Algorithmic Method for System Prompt Leakage www.trendmicro.com/en_us/res…
In the second article of our series on attacking artificial intelligence (AI), let us explore an algorithmic technique designed to induce system prompt leakage in LLMs, which is called PLeak. System Prompt Leakage pertains to the risk that preset system prompts or instructions meant to be followed by the model can reveal sensitive data when exposed.
For organizations, this means that private information such as internal rules, functionalities, filtering criteria, permissions, and user roles can be leaked. This could give attackers opportunities to exploit system weaknesses, potentially leading to data breaches, disclosure of trade secrets, regulatory violations, and other unfavorable outcomes.
Research and innovation related to LLMs surges day by day, with HuggingFace alone having close to 200k unique text generation models. With this boom in generative AI, it becomes crucial to understand and mitigate the security implications of these models.