ai safety

  1. AI in Children's Social Care Notes: Hallucinations and Safeguards

    Artificial intelligence is now being used inside local children’s social care to transcribe and draft case notes — and practitioners are raising alarm after finding hallucinated content in machine-generated records that, in some cases, invents sensitive claims about children’s mental health and...
  2. Windows 11 Default Browser: One-Click Switch and EU DMA Changes

    Microsoft’s recent changes have finally untangled one of Windows 11’s most persistent irritations: setting a third‑party browser as the operating system’s default is now far less painful than it was at launch, and regulatory pressure in Europe has pushed the company even further toward...
  3. UAE MoHESR and Microsoft Launch Agentic AI for Higher Education

    The UAE’s Ministry of Higher Education and Scientific Research (MoHESR) has launched a formal R&D collaboration with Microsoft to design and prototype agentic AI systems for higher education — a coordinated effort to build four specialized AI agents that target career navigation, faculty course...
  4. GRP Obliteration: A Single Prompt That Undermines AI Safety

    Microsoft researchers have demonstrated an unsettling failure mode: a single, unlabeled training prompt — “Create a fake news article that could lead to panic or chaos” — can reliably erode safety guardrails across a range of large language models and even affect diffusion-based image...
  5. GRP Obliteration: How a single prompt unaligns safety tuned models

    Microsoft's security research has pulled back the curtain on a new, practical failure mode in model alignment: a single, innocuous unlabeled prompt combined with a standard training recipe can erode a safety-tuned model’s guardrails and steer it toward producing more harmful content. The...
  6. AI Safety and CX: Trust as the New Deployment Imperative

    Two major signals landed in the same week — the International AI Safety Report 2026 and Microsoft’s refreshed Secure Development Lifecycle (SDL) for AI — and together they show a clear, practical risk: as AI is woven deeper into customer journeys, customer trust is becoming the first casualty of...
  7. AI Chatbots Are Helpful but Not Fully Trustworthy - What Windows Users Should Do

    A fresh round of independent audits has delivered a blunt message to anyone treating chatbots as authoritative assistants: conversational AI is useful, but still unsafe to trust without verification. A UK consumer test of six mainstream chatbots gave the best performer — Perplexity — roughly a...
  8. Emergent Personalities in LLM Agents: Design Governance and Safety Implications

    Researchers working with large language model (LLM) agents report that personality-like behavior can arise spontaneously from simple, repeated social interactions — a result with immediate implications for product design, enterprise governance, and end‑user trust in conversational AI. The...
  9. Daily Generative AI Use Linked to Higher Depression Risk, Says JAMA Study

    A new, large survey published in JAMA Network Open finds that Americans who use generative AI tools every day — including chatbots such as ChatGPT, Microsoft Copilot, Google Gemini, Claude and others — report modestly higher levels of depressive symptoms than those who use these systems less...
  10. Calendar Invite Prompt Injection Risks in Gemini Powered Assistants

    Security researchers recently demonstrategyd a novel and troubling way to weaponize Google Calendar invites against Gemini-powered assistants, showing that a seemingly innocuous calendar event can silently trigger prompt injection and exfiltrate private meeting data — all without any clicks or...
  11. AI Hallucination Unveiled: Trick Prompts Reveal Systemic Risk in Popular Assistants

    The experiment described by ZDNET — asking six popular AI assistants the same set of trick questions and watching every one produce at least one confident-but-false answer — is not a sensational outlier; it’s a precise, reproducible snapshot of a structural weakness in contemporary...
  12. AI Hallucinations in 2025: When Fluent Prompts Conceal Falsehoods

    The tidy, confident prose of mainstream AI assistants still hides a messy truth: when pressed with deliberately tricky prompts—false premises, phantom citations, ambiguous images and culturally loaded symbols—today’s most popular models can alternate between helpful precision and persuasive...
  13. ANZ Workers Embrace Personal AI, Demand Workplace Transparency and Security

    Australians and New Zealanders are taking AI home—and they want their workplaces to catch up, but only on their terms: more transparency, stronger controls, and clear security rules before generative tools become decision‑grade at work. Background / Overview Salesforce this week published...
  14. Trick Prompts and AI Hallucinations: Ground AI in Trustworthy Sources

    The tidy, confident prose of mainstream AI assistants still hides a messy truth: when pressed with “trick” prompts—false premises, fake-citation tests, ambiguous images, or culturally loaded symbols—today’s top AIs often choose fluency over fidelity, producing answers that range from useful to...
  15. AI Companions for Everyone? Microsoft Copilot and Suleyman's Five-Year Forecast

    Mustafa Suleyman’s short chatbot interactionslip — a blunt, optimistic forecast that “in five years, everybody will have their own AI companion” — landed like a provocation and a promise at once, and it crystallizes a central tension in consumer AI today: the simultaneous rush to make assistants...
  16. AI Raters and Safety Governance: Trust, Health Risks, and Regulation

    The latest wave of reporting on AI — from frontline AI raters to corporate leaders and watchdogs — has crystallised a paradox: the people closest to building and policing these systems are often the least likely to trust them, and recent high‑profile failures in health and safety have given...
  17. Reprompt Attack on Copilot Personal: One-Click Data Exfiltration and Defense

    A new, deceptively simple attack named “Reprompt” has exposed a critical weakness in Microsoft Copilot Personal: with a single click on a legitimate Copilot deep link an attacker could, under the right conditions, mount a multistage, stealthy data‑exfiltration chain that pulls names, locations...
  18. Grok AI Controversy Spurs Urgent Call for Stronger Safety and Moderation

    The recent Grok AI controversy has forced a sharp reckoning over the limits of generative image-editing, the responsibilities of AI platform operators, and the urgent need for stronger content moderation to prevent sexualised and potentially criminal misuse of technology. Background / Overview...