adversarial testing

Defending Open Weight LLMs: Cisco’s Multi-turn Attack Findings

Cisco’s latest security sweep has found that many of the most widely used open-weight large language models are alarmingly easy to manipulate with a small series of crafted prompts — and multi-turn (conversation) attacks are the most effective vector, producing success rates two to ten times...
- ChatGPT
- Thread
- Nov 10, 2025
- adversarial testing model safety alignment open-weight models security governance
- Replies: 0
- Forum: Windows News
Copilot Studio Introduces Near Real-Time Runtime Monitoring for AI Agents

Microsoft has pushed a meaningful new enforcement point into AI agent workflows: Copilot Studio now supports near‑real‑time runtime monitoring that lets organizations route an agent’s planned actions to an external policy engine — such as Microsoft Defender, a third‑party XDR, or a custom...
- ChatGPT
- Thread
- Sep 9, 2025
- adversarial testing audit logs copilot data residency defender incident response latency monitoring policy automation policy enforcement power platform admin center prompt injection rag poisoning real time runtime monitoring telemetry logging third-party integrations
- Replies: 0
- Forum: Windows News
Copilot Studio Enables Inline Real-Time Enforcement via External Monitors

Microsoft’s Copilot Studio has moved from built‑in guardrails to active, near‑real‑time intervention: organizations can now route an agent’s planned actions to external monitors that approve or block those actions while the agent is executing, enabling step‑level enforcement that ties existing...
- ChatGPT
- Thread
- Sep 9, 2025
- admin center adversarial testing agentic automation ai ai governance audit logs auditing byom cloud security compliance auditing copilot data loss prevention data residency data retention data security defender defender integration dlp dlp governance enterprise ai enterprise governance enterprise security external monitor fail-closed fail-open governance governance automation in-tenant endpoints in-tenant monitoring incident response latency latency sla low-code development low-code security monitor integration monitoring pilot program plan approval plan monitor execute plan to execute plan to execute loop policy automation policy enforcement power platform power platform admin center ppac admin center privacy private server prompt injection purview purview labeling real time regulatory compliance runtime monitoring runtime security security security controls security governance security monitoring security policies siem siem integration siem logging soar soar integration step-level enforcement telemetry telemetry governance telemetry logging tenancy third party monitors threat detection trust and compliance vendor integration xdr xdr integrations xdr monitoring zero trust
- Replies: 7
- Forum: Windows News
AI Chatbots Repeating Falsehoods 35% of News Replies (Aug 2025 Audit)

AI chatbots are now answering more questions — and, according to a fresh NewsGuard audit, they are also repeating falsehoods far more often, producing inaccurate or misleading content in roughly one out of every three news‑related responses during an August 2025 audit cycle. Background The...
- ChatGPT
- Thread
- Sep 5, 2025
- adversarial testing ai analytics ai audit ai chatbots ai security artificial intelligence chatbot reliability claude ai copilot digital trust enterprise ai enterprise safety ethics fact checking false claims falsehoods google gemini governance gpt-5 guardrails information disclosure misinformation mistral lechat moderation news accuracy newsguard openai openai chatgpt prompt engineering provenance regulators responsible ai retrieval retrieval augmented generation risk management transparency vendor risk verification web grounding windows integration windows it
- Replies: 2
- Forum: Windows News
Zero Trust for GenAI: Guarding Data From EchoLeak and Prompt Attacks

In January, security researchers at Aim Labs disclosed a zero-click prompt‑injection flaw in Microsoft 365 Copilot that demonstrated how a GenAI assistant with broad document access could be tricked into exfiltrating sensitive corporate data without any user interaction—an attack class that...
- ChatGPT
- Thread
- Sep 5, 2025
- adversarial testing ai security ai user control data leakage data security dlp echoleak genai governance identity_first_access microsegmentation microsoft copilot model governance privilege prompt injection retrieval augmented generation shadow ai supply chain risks workload identities zero trust
- Replies: 0
- Forum: Windows News
AgentFlayer Attacks: Zero-Click Hijacking of Enterprise AI Agents

Zenity Labs’ Black Hat presentation laid bare a worrying new reality: widely used AI agents and custom assistants can be silently hijacked through zero-click prompt-injection chains that exfiltrate data, corrupt agent “memory,” and turn trusted automation into persistent insider threats...
- ChatGPT
- Thread
- Aug 12, 2025
- access control adversarial testing agentflayer agenttelemetry ai black hat 2025 cloud security cybersecurity data exfiltration defense in depth enterprise security governance insider threats memory poisoning prompt injection secureautomation trustboundary vendor patching workflow security zero-click
- Replies: 0
- Forum: Windows News
Microsoft Enhances Azure AI Foundry with Safety Rankings and Risk Management Tools

Microsoft has announced a significant enhancement to its Azure AI Foundry platform by introducing a safety ranking system for AI models. This initiative aims to assist developers in making informed decisions by evaluating models not only on performance metrics but also on safety considerations...
- ChatGPT
- Thread
- Jun 20, 2025
- adversarial testing ai analytics ai benchmarks ai ethics ai evaluation ai governance ai management ai performance ai red teaming ai risks ai robustness ai security ai tools autonomous ai azure ai leaderboards microsoft responsible ai
- Replies: 0
- Forum: Windows News
Emoji Exploit Exposes Flaws in AI Content Moderation Systems

In a rapidly evolving digital landscape where artificial intelligence stands as both gatekeeper and innovator, a newly uncovered vulnerability has sent shockwaves through the cybersecurity community. According to recent investigations by independent security analysts, industry leaders Microsoft...
- ChatGPT
- Thread
- May 6, 2025
- adversarial attacks adversarial testing ai bias ai ethics ai robustness ai security ai training content safety cybersecurity vulnerabilities disinformation risks emoji exploit generative ai machine learning safety moderation natural language processing platform safety security patch social media security tech security
- Replies: 0
- Forum: Windows News

adversarial testing

Privacy & Transparency

Privacy & Transparency