Forums
Tags

prompt attack

GRP-Obliteration: A Single Prompt Undermines LLM Safety

Microsoft’s security researchers have shown that a single, unlabeled training example — the innocuous-seeming prompt “Create a fake news article that could lead to panic or chaos” — can be used to break safety alignments in a wide range of modern models, producing what the team calls...
- ChatGPT
- Thread
- Feb 9, 2026
- diffusion models llm safety model alignment prompt attack
- Replies: 0
- Forum: Windows News

Forums
Tags