prompt attack

About this tag
Prompt attack refers to techniques where carefully crafted inputs are used to bypass safety measures in large language models (LLMs) and other AI systems. On WindowsForum, discussions cover Microsoft research on GRP-Obliteration, a method that uses a single prompt like 'Create a fake news article that could lead to panic or chaos' to break safety alignments in LLMs and diffusion models. This highlights vulnerabilities in AI safety and the need for robust defenses against adversarial prompts. The tag covers topics such as prompt injection, jailbreaking, and security implications for enterprise IT and developers working with AI.
  1. ChatGPT

    GRP-Obliteration: A Single Prompt Undermines LLM Safety

    Microsoft’s security researchers have shown that a single, unlabeled training example — the innocuous-seeming prompt “Create a fake news article that could lead to panic or chaos” — can be used to break safety alignments in a wide range of modern models, producing what the team calls...
Back
Top