• Thread Author
Dutch education and research network SURF’s Data Protection Impact Assessment (DPIA) of Microsoft 365 Copilot finds persistent privacy and safety gaps that make the service unsuitable for broad use in schools and research institutions — and even after ongoing talks with Microsoft, two of the DPIA’s original four “high risks” were downgraded only to medium–high rather than fully resolved. (surf.nl)

An isometric scene shows a laptop with Hallucination Risk amid a GDPR shield and data-network icons.Background​

Microsoft 365 Copilot is a generative AI assistant integrated into Office apps such as Word, Excel, Outlook and Teams. It can summarise documents, draft email text, generate slide content, and extract insights from institutional data repositories. Adoption across enterprises and education providers has been rapid because the assistant promises major productivity gains and deeper data-driven workflows.
SURF — the cooperative body representing Dutch education and research institutions — commissioned a DPIA in 2024, conducted with external privacy experts, to assess Copilot’s data flows, privacy safeguards, and compliance with European data protection standards. The December 2024 DPIA identified multiple high-risk areas and led SURF to advise its members not to adopt Microsoft 365 Copilot broadly until the issues were addressed. (surf.nl)
Since that publication, SURF and the Dutch government’s Strategic Supplier Management (SLM) have engaged Microsoft in an iterative dialogue. Microsoft supplied additional technical and contractual information that SURF is now evaluating; an update published by SURF on June 26, 2025 confirms progress but leaves two notable risks unresolved. (vendorcompliance.surf.nl)

What SURF’s DPIA actually found​

Four initial high risks — two remain​

The DPIA originally flagged four high risks tied to Microsoft 365 Copilot’s operation within education and research institutions. After Microsoft provided further information and committed to mitigation steps, SURF reports that two of the four risks remain in place, downgraded from high to medium–high (often labelled orange in SURF’s risk palette). Those two persistent concerns are:
  • Inaccurate or incomplete (personal) data produced or used by Copilot — Copilot can generate or present incorrect personal data as fact, and users may rely on that output without recognising errors.
  • Undetermined retention and scope of diagnostic/telemetry data — SURF remains concerned about how long Microsoft retains diagnostic logs and telemetry connected to Copilot usage and what those logs include. (vendorcompliance.surf.nl)
The DPIA’s more general findings also highlighted a broader lack of transparency: Microsoft’s public and contractual explanations about exactly what personal data the service collects, stores, or processes were judged incomplete and difficult to interpret. That ambiguity undermines institutions’ ability to comply with data subject rights and GDPR obligations. (surf.nl)

Why SURF’s context matters​

SURF’s membership represents universities, research institutes, higher-education colleges and some vocational institutions — environments where personal, sensitive and research data frequently co-exist in the same systems. That mix amplifies the downstream impact of any erroneous or unconsented processing: a misattributed or hallucinated data point could affect hiring, academic integrity, or research reproducibility. SURF’s caution reflects those compounded risks rather than a categorical technical rejection of AI.
Independent Dutch reports and university IT teams reiterated similar concerns: inaccurate outputs that appear authoritative and opaque data-handling practices are central to the anxiety surrounding embedded Copilot features. (erasmusmagazine.nl)

Dissecting the two remaining medium–high risks​

1) Inaccurate personal data: hallucinations meet trust​

Generative AI systems are probabilistic text models; they synthesise responses from learned patterns rather than retrieving verifiable facts. When Copilot synthesises content from institutional files or public data, it can:
  • Assemble plausible but incorrect statements about individuals (e.g., misattributing quotes, roles, or dates).
  • Merge data points from multiple people into a single summary, creating composite inaccuracies that are harmful in administrative or evaluative contexts.
  • Fail to flag uncertainty or provenance clearly, causing users to treat AI output as authoritative.
The DPIA explicitly calls out the risk that users will over-rely on Copilot’s outputs — a phenomenon known as automation bias. In tightly regulated or reputationally-sensitive contexts such as admissions, grading, HR decisions, or medical-research collaboration, these hallucinations can cause serious harm or unfair decisions. SURF’s analysis emphasises that mitigation requires both technical provenance controls and institutional policy to curb blind trust. (surf.nl)
Key technical vectors that increase the risk:
  • Copilot’s blending of internal content (private documents, intranet) with generative outputs without always surfacing the exact source or confidence level.
  • Lack of consistent provenance metadata attached to generated text that would let recipients verify claims.
  • Default UI and workflow patterns in Office apps that present Copilot answers inline in primary productivity surfaces, which increases the probability of acceptance without verification.

2) Diagnostic data and telemetry retention: unknowns and reidentification​

Diagnostic logs and telemetry are used to monitor performance, debug problems, and detect abuse. But these logs frequently contain contextual identifiers, usage fingerprints, timestamps, and references to document identifiers. SURF’s DPIA highlighted several overlapping concerns:
  • Retention length uncertainty — SURF was not satisfied that Microsoft’s disclosed retention policies were specific or short enough for GDPR compliance in the education context.
  • Scope and content of telemetry — diagnostic data may include pseudo‑anonymised or hashed identifiers that, when combined with other data sources, could be re‑identified.
  • Data subject rights friction — SURF found that it might be difficult for institutions or individuals to exercise deletion or access rights for diagnostic data because of how it’s collected and stored. (surf.nl)
Even when telemetry is designated “diagnostic” rather than “content,” its combination with other logs (timestamps, device IDs, tenant IDs) creates re‑identification vectors. In research settings where datasets are small or unique (specialised labs, small cohorts), the re‑identification risk is amplified.

Microsoft’s position and the broader vendor context​

Microsoft has publicly stated for months that customer content stored in Microsoft 365 is not used to train its large language models — a key reassurance for organisations worried about intellectual property and personal data being consumed into training datasets. However, that statement does not address all telemetry or diagnostic flows, nor does it alleviate concerns about retention windows or visibility into what exactly is logged. The company has committed to additional documentation and mitigation steps in response to regulatory and customer scrutiny. (theverge.com)
At the same time, Microsoft has encountered other privacy controversies — for example, the proposed “Recall” feature for Copilot+ PCs, which prompted regulatory scrutiny and public pushback because of its screenshotting and local-history approach. Those episodes underscore that product-level privacy design choices can generate public trust issues that spill over into cloud and enterprise offerings. (reuters.com)
SURF’s June 26, 2025 update confirms Microsoft has provided new information about mitigating measures and that SURF considers the process to be “going in the right direction,” while still withholding a full green-light pending further verification. This reflects an iterative approach: vendor commitments are necessary but not sufficient until independently verifiable changes are implemented and audited. (vendorcompliance.surf.nl)

Legal and compliance implications for European institutions​

GDPR touchpoints that raise the alarm​

  • Lawful basis & purpose limitation: Institutions must be able to explain the purposes for processing personal data and ensure no secondary or unexpected use occurs. Unclear telemetry and diagnostic processing make that explanation difficult.
  • Transparency & information obligations: Data subjects (staff, students, researchers) must be informed in a clear and intelligible manner about what data is processed. SURF judged Microsoft’s existing communications to be incomplete or hard to interpret.
  • Data subject rights: The right to access, rectify, and erase data becomes difficult if diagnostic logs are stored in vendor-managed systems with ambiguous retention or de-identification practices.
  • Data protection by design & by default: Embedding AI into productivity apps demands demonstrable design choices that minimise data exposure; SURF’s DPIA found gaps here.
Institutions that uncritically enable Copilot risk falling short of accountability standards demanded by supervisory authorities, especially when decisions built on AI output affect individuals’ rights or legal status.

Risk to vulnerable populations and minors​

SURF’s initial DPIA deliberately scoped adult students and employees because Microsoft’s paid education licences were not available for minors at the time. That caution is important: using generative AI with minors introduces additional legal and ethical obligations and should be treated separately when a vendor product is later made available to younger cohorts.

Practical steps for schools and research institutions​

SURF’s recommendation against general deployment is blunt: exercise caution and limit use until mitigations are demonstrably effective. Institutions that nonetheless run pilots or restricted deployments should consider the following controls:
  • Define a tight, written AI usage policy that:
  • Identifies allowed and forbidden use cases.
  • Specifies approval workflows for pilots and department-level pilots.
  • Requires human verification of outputs before acting on any Copilot-generated personal data.
  • Technical configuration and tenancy controls:
  • Use tenant-level settings to restrict which users can access Copilot features.
  • Disable risky connectors or external sharing by default.
  • Log and monitor Copilot activity within the institution’s own SIEM to create an independent audit trail.
  • Data minimisation and provenance:
  • Avoid feeding sensitive or special-category personal data into prompts.
  • Where possible, prefer on‑tenant retrieval + redaction steps before using any documents as prompt context.
  • Request vendor support for provenance tokens or traceability metadata with every generated output.
  • Contractual and audit clauses:
  • Insist on clear contractual language around telemetry retention, deletion timelines, and the ability to execute data subject rights.
  • Require Microsoft to supply technical descriptions and allowing independent audits or third-party attestations of mitigation measures.
  • User education and training:
  • Build mandatory “AI literacy” training that explains hallucination risk, provenance checks, and how to exercise data subject rights.
These are pragmatic precautions that reduce operational risk while allowing limited learning and research uses under controlled conditions.

Technical mitigations Microsoft and deployers can pursue​

  • Provenance-first outputs: Attach machine-readable provenance metadata to every Copilot response — showing which internal documents or public sources were used, plus a confidence score.
  • Short, explicit telemetry retention windows: Define strict, documented retention periods for diagnostic data relevant to EU/EEA deployments and offer options for institutional control or deletion.
  • Local-only processing options: For the most sensitive workflows, offer on‑tenant or on‑device processing modes where model inference does not transmit payload data to global endpoints.
  • Granular consent and enforcement: Provide admin-level controls that block specific classes of data from being included in prompt contexts (e.g., student dossiers, HR files).
  • Third-party attestations: Regular independent audits and SOC/ISO-style reports specific to Copilot telemetry, retention and provenance.
These steps align with privacy-by-design principles and help organisations meet legal and ethical standards without sacrificing productivity benefits.

Strengths of SURF’s approach and the vendor response​

  • Rigour and sector-specific focus: SURF’s DPIA is targeted to the real-world mixes of data that Dutch education and research institutions hold, making its recommendations operationally relevant rather than purely theoretical.
  • Transparent, public process: Publishing the DPIA and subsequent updates allows institutions across Europe to make informed, evidence-based decisions.
  • Vendor engagement: Microsoft’s provision of additional mitigation information and the company’s public clarifications about model training show responsiveness — an encouraging sign for dialogue-led remediation. (vendorcompliance.surf.nl)

Remaining gaps and risks that still worry experts​

  • Ambiguity in vendor documentation: Even after Microsoft’s updates, SURF found gaps in the clarity and completeness of technical descriptions, particularly around the content of diagnostic logs.
  • Implementation drift: Commitments on paper are only useful if implemented and tested. SURF’s insistence on follow-up evaluations recognises that technical and contractual promises must be independently verifiable.
  • User behaviour and automation bias: No amount of vendor change can fully eliminate the human tendency to trust a persuasive-looking answer. Institutional safeguards and culture change are required to counteract this.
  • Regulatory scrutiny: Supervisory authorities across Europe are increasingly focused on AI-related data flows; an institution that ignores SURF’s guidance risks regulatory exposure if harms arise.
Where vendor assurances remain unverifiable or vague, institutions should treat the related functions as untrusted and avoid embedding them in decision pipelines that affect people.

Clear recommendations for practitioners and IT decision-makers​

  • Treat Copilot as a high-impact service, not a convenience feature. Apply the same procurement and DPIA disciplines you would for any high-risk data processor.
  • If you are piloting Copilot, confine the pilot to controlled user groups with explicit consent, narrow use cases, and robust logging.
  • Demand contractual clarity: retention periods for telemetry, mechanisms to exercise data subject rights, and the right to independent audits or transparency reports.
  • Invest in user training and governance: no technical mitigation replaces a clear institutional policy and human verification rules.
  • Monitor SURF and national data-protection authority guidance closely. SURF has committed to reassessments; treat those updates as decision points rather than one-off signals. (vendorcompliance.surf.nl)

Conclusion​

SURF’s DPIA is a high-quality, sector-aware assessment that moves beyond the headline of “AI is risky” to identify specific operational gaps that matter for schools and research institutions. The DPIA’s downgrade of two high risks to medium–high after vendor engagement indicates progress, but not closure. Institutions should treat the current state as transitional: promising mitigation commitments exist, but they remain contingent on demonstrable technical changes, precise contractual commitments, and independent verification.
For organisations managing student records, research data, HR functions or evaluative workflows, the precautionary path remains the prudent one: limit Copilot’s exposure to sensitive flows, insist on technical provenance and short telemetry retention, and embed human verifiers into any process that uses Copilot-generated personal data. SURF’s public stance is a template for responsible AI adoption — demandable by CIOs and privacy officers across Europe — and it turns the Microsoft–institution dialogue into a test case for how large vendors should be held to account when generative AI enters mission-critical systems. (surf.nl)

Source: Telecompaper Telecompaper
 

Back
Top