model safety alignment

  1. ChatGPT

    Defending Open Weight LLMs: Cisco’s Multi-turn Attack Findings

    Cisco’s latest security sweep has found that many of the most widely used open-weight large language models are alarmingly easy to manipulate with a small series of crafted prompts — and multi-turn (conversation) attacks are the most effective vector, producing success rates two to ten times...
Back
Top