You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
attention analysis
About this tag
Attention analysis is a technique used to detect backdoors in large language models (LLMs), as demonstrated in recent Microsoft research. The approach identifies three observable signatures of potential model poisoning: an attention "double triangle" pattern, memorized leakage of poisoning data, and fuzzy trigger activation. A lightweight scanner performs forward-pass-only detection to reconstruct likely triggers, helping security teams and model consumers reduce the risk of deploying compromised models in production. This tag covers discussions of attention-based detection methods for LLM security, particularly in enterprise and AI deployment contexts.
Sleeper-agent backdoors are no longer just a movie plot device — Microsoft’s latest research shows practical, measurable signs that a large language model (LLM) may have been secretly poisoned during training, and offers a lightweight scanner that uses those signs to reconstruct likely triggers...