attention analysis

  1. Detecting LLM Backdoors: Three Signatures and a Lightweight Scanner

    Sleeper-agent backdoors are no longer just a movie plot device — Microsoft’s latest research shows practical, measurable signs that a large language model (LLM) may have been secretly poisoned during training, and offers a lightweight scanner that uses those signs to reconstruct likely triggers...