You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
llm backdoors
About this tag
LLM backdoors are a growing security concern for organizations deploying large language models in production. Recent research from Microsoft and Anthropic shows that backdoors can be implanted through data poisoning during training, with as few as 250 malicious documents needed to trigger unwanted behaviors. Microsoft's work identifies three observable signatures—attention double triangle, memorized leakage of poisoning data, and fuzzy trigger activation—and offers a lightweight scanner to detect them. These findings challenge assumptions that model scale alone provides defense and raise operational risks for enterprises using LLMs in tools like Microsoft 365 Copilot. Security teams and model consumers can use these detection methods to reduce the risk of deploying compromised models.
Sleeper-agent backdoors are no longer just a movie plot device — Microsoft’s latest research shows practical, measurable signs that a large language model (LLM) may have been secretly poisoned during training, and offers a lightweight scanner that uses those signs to reconstruct likely triggers...
Anthropic’s new experiment finds that as few as 250 malicious documents can implant reliable “backdoor” behaviors in large language models (LLMs), a result that challenges the assumption that model scale alone defends against data poisoning—and raises immediate operational concerns for...