You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser.
llm poisoning
About this tag
LLM poisoning refers to the deliberate insertion of malicious data into a large language model's training or fine-tuning pipeline to alter its behavior. Recent research from Anthropic, the UK AI Security Institute, and The Alan Turing Institute demonstrates that as few as 250 malicious documents can implant reliable backdoors in production LLMs. This finding challenges the assumption that model scale alone defends against data poisoning and raises operational concerns for organizations using models like Claude within Microsoft 365 Copilot. The tag covers threats, mitigation strategies, and implications for enterprise AI deployments, emphasizing the need for data provenance and guardrails.
Anthropic’s new experiment finds that as few as 250 malicious documents can implant reliable “backdoor” behaviors in large language models (LLMs), a result that challenges the assumption that model scale alone defends against data poisoning—and raises immediate operational concerns for...