language model security

  1. Microsoft Reveals Open Weights Scanner to Detect Backdoored LLMs at Scale

    Microsoft’s new research releasing an open‑weights scanner for detecting backdoored language models marks one of the most concrete, operational steps yet toward measurable supply‑chain assurance for LLMs — the work identifies three practical, model‑level signatures of poisoning and shows a...
  2. AI Guardrails Vulnerable to Emoji-Based Bypass: Critical Security Risks Uncovered

    The landscape of artificial intelligence (AI) security has experienced a dramatic shakeup following the recent revelation of a major vulnerability in the very systems designed to keep AI models safe from abuse. Researchers have disclosed that AI guardrails developed by Microsoft, Nvidia, and...