Summarize Before You Upload: AI Safety for Universities

  • Thread Author
Jena Zangs’s short, practical recommendation — summarize before you upload — is the clearest, most actionable piece of AI safety advice a campus administrator can hear right now. As universities rush to fold generative AI into advising, administration, research and classroom workflows, the single, recurring vulnerability is not always the algorithm: it’s the data we feed into it. Zangs, chief data and AI officer at the University of St. Thomas, told University Business that administrators and staff should obscure or summarize sensitive records before sending them to chatbots and public models, because not all AI services treat inputs the same way and institutional control over data can evaporate when information is posted into consumer tools.

Laptop shows a Summary shield graphic amidst data governance visuals.Background / Overview​

Generative AI tools are now embedded across campus life: drafting communications, summarizing meeting notes, triaging help tickets, scaffolding grant proposals and powering study aids. That utility is real, but the stakes are high when the content includes student records, personnel files, health information or proprietary research. Federal privacy laws (like FERPA in the United States), institutional policy and reputational risk converge on a single operational problem: who controls the data once it leaves the institutional network? Universities are grappling with that question in policy committees and vendor RFPs while IT teams scramble to implement immediate, pragmatic guardrails.
Zangs’s point is deceptively simple and deliberately tactical: when you need the analytical power of an LLM but the raw document contains personally identifiable or sensitive elements, create a summarized, de‑identified abstraction and submit that instead of the full record. The practice reduces exposure, preserves decision‑useful context and buys time for longer term procurement and governance work.

Why the distinction between public and enterprise models matters​

Consumer chatbots are not the same legal or technical environment as campus enterprise deployments​

Not all AI services are created equal when it comes to data handling. Major providers now publish explicit distinctions between consumer‑facing services and enterprise products: consumer interactions are often eligible to be used for model improvement unless individual users opt out, whereas enterprise tiers typically include contractual and technical controls that exclude customer inputs from training and give organizations retention and access controls. OpenAI states that business products such as ChatGPT Enterprise and the API are not used to train models by default, and offers controls for retention and access. Microsoft makes a comparable claim for many enterprise Copilot scenarios while outlining different behavioral rules for non‑signed‑in or consumer interactions. These platform differences are material for universities assessing risk.

But “not used to train” is a policy boundary, not an absolute guarantee​

Two important caveats follow the enterprise-vs-consumer distinction. First, policy language and platform settings can change, and contractual protections depend on the product and the legal agreement the institution negotiates. Second, legal processes — such as court orders or litigation holds — can force a provider to retain or disclose logs that would otherwise be deleted, as recent reporting has shown. Institutions must therefore treat vendor promises as necessary but not sufficient controls: combine contractual assurances with operational practices like data minimization, logging, and internal audits.

The evidence base for Zangs’s recommendation​

Universities and IT governance bodies across the U.S. are converging on the same mitigation playbook that Zangs described: minimize what you send to models, de‑identify aggressively, prefer vendor products with enterprise‑grade contractual protections, and keep humans in the loop.
  • University IT and teaching offices — from UW–Madison to the University of Maryland and Johns Hopkins — publish explicit guidance forbidding the submission of identifiable student records to unvetted third‑party generative AI tools and requiring de‑identification or institutional approval for any tool that will process protected data. These policies often name FERPA and HIPAA as legal drivers and call out the risk of re‑identification when context or metadata remain.
  • Vendors and platform operators also publish bifurcated privacy commitments. OpenAI and Microsoft both describe different handling for enterprise customers versus consumer users; in practice, institutional IT leaders must validate those promises and negotiate terms that align with campus policy.

What “summarize before uploading” looks like in practice​

Summarization is both a policy and a technical step. When you apply it deliberately, you get analytic payoff while reducing exposure.

A practical, repeatable workflow (operational checklist)​

  • Identify whether the document contains protected or sensitive data (student grades, social security numbers, health information, personnel decisions, proprietary research).
  • If sensitive, determine whether an internal, vetted tool (on‑prem model, vetted enterprise Copilot, or a tool authorized by the privacy office) is available; if not, proceed to steps 3–6.
  • Create a short, neutral summary that captures facts, dates and the minimal context required for the analytical task. Remove names, identifiers and any direct quotations that could be traced back to individuals.
  • Replace unique identifiers with generic tokens (e.g., “Student A,” “Dept X,” “Research Project 1”) and avoid including metadata that could re‑identify subjects.
  • Run the summary through a university‑approved internal checklist (privacy officer, department head or delegated reviewer) before submitting to a third‑party tool.
  • Store the original record only in institutional repositories with proper access controls and document the decision to use a third‑party model for auditability.
This workflow lets administrators get utility from a model while keeping control of the canonical record and ensuring an auditable trail. It preserves human judgment for decisions with legal or reputational consequences.

Why summarization beats naive redaction in many cases​

Redaction (blacking out names and numbers) can be brittle. Textual redaction often leaves contextual breadcrumbs that allow re‑identification, especially when combined with public information. Summarization forces abstraction: it surfaces what matters for the task and discards the rest. Where redaction can be reversed by patching together context, a well‑crafted summary preserves intent while reducing the attack surface. University guidance from multiple campuses highlights de‑identification and reasonable determination tests that support this practice.

Technical and governance controls campuses must put in place​

Summarization is essential but not sufficient. Institutions should combine it with the following layered controls.

Contractual and procurement controls​

  • Negotiate data processing agreements and enterprise contracts that explicitly prohibit using institutional inputs to train general models, that define retention windows, and that provide breach notification timelines. OpenAI’s enterprise privacy page and Microsoft’s enterprise guidance are examples of the clauses procurement teams will expect to see mirrored in vendor language.
  • Demand audit rights and technical documentation on how models are updated, how logs are retained, and how access controls are implemented.

Identity, access, and enterprise integration​

  • Require SSO (SAML/Entra/Azure AD) integration and fine‑grained role controls so admins can limit which employees can access a given model and dataset. Enterprise products commonly provide these controls; consumer versions do not.
  • Integrate tools with existing Data Loss Prevention (DLP) and Conditional Access policies to block or flag uploads containing sensitive data.

Secure architecture and model choices​

  • Favor on‑premises or private cloud models and vendor options that let you deploy models inside institutional boundaries when dealing with highly sensitive data (IRB research data, patient records, proprietary IP).
  • Consider sandboxed, purpose-built smaller models for internal tasks when feasible; they reduce exposure and are cheaper to run. Where large foundation models are necessary, use them inside a controlled pipeline that pre‑filters inputs and post‑filters outputs.

Monitoring and incident readiness​

  • Maintain an incident response playbook specific to generative AI (how to respond to hallucinated policy statements, accidental PII disclosure, or vendor data incidents).
  • Audit usage logs periodically and set alerts for anomalous query patterns that might indicate data scraping, bulk exports, or model‑poisoning attempts.
These controls match the layered recommendations IT teams across higher education are adopting as they stand up Copilot instances and campus assistants.

The threat landscape: beyond privacy — model safety and data integrity​

Jena Zangs’s privacy advice addresses disclosure risk, but campuses must also consider other, more technical failure modes.

Training leakage and legal retention​

Even when vendors promise not to use enterprise data for training, legal processes can force them to retain logs; recent reporting shows court orders can require preservation of chats that users thought were deleted. This is a reminder: policy controls can be overridden by law, so institutions must maintain layered operational protections in addition to contractual language.

Model poisoning and backdoors​

Academic and industry researchers have demonstrated that small, targeted sets of documents can be used to poison a model or create backdoors in downstream systems. Those risks are acute if an institution uses a model that continually ingests new organizational content without robust validation. Community reporting and practitioner threads highlight “small sample poisoning” as a real operational vector for attackers or accidental contamination. Institutions that allow unconstrained uploads into an organization’s LLM pipeline need mitigations — such as input vetting, provenance tracking and retraining controls — to avoid corruption.

Prompt injection and hallucination risks​

Generative systems are vulnerable to prompt injection (maliciously crafted inputs that override safety instructions) and hallucinations (confidently stated false outputs). These failure modes have direct operational consequences in a campus setting: a chatbot that fabricates disciplinary outcomes, misstates policy, or improperly releases guidance can cause legal and reputational harm. Training staff to verify outputs and implementing a human‑review gate for high‑stakes workflows are mandatory safety practices.

Campus policies and real‑world examples​

Across the U.S., universities are striking similarly cautious tones while adapting to the productivity promise of AI.
  • The University of Maryland’s GenAI guidelines emphasize human oversight, privacy compliance and explicit de‑identification before uploading student work to external tools, echoing Zangs’s advice.
  • UW–Madison’s registrar page explicitly links FERPA obligations to AI usage and prohibits the use of generative AI on protected institutional data unless tools have been approved through an institutional review process.
  • Institutional committees (Cornell’s faculty committee, for example) have produced multi‑section reports outlining options for teaching and research, covering prohibition, conditional use with attribution, and active encouragement under monitored conditions. These reports consistently call for policy flexibility, transparency, and academic oversight — not blanket bans.
These campus plays show an operational pattern: combine prohibition for high‑risk use cases, permitted use for well‑defined workflows, and active governance to iterate policy as both threats and vendor practices evolve.

Practical recommendations for IT leaders and administrators​

Below are concrete steps university IT and administration teams should adopt immediately.
  • Establish a cross‑functional AI governance committee composed of IT security, legal/privacy, institutional research, faculty representatives and student services. Charge it with vendor review, risk classification and a rolling approval list of tools.
  • Produce a simple, campus‑wide policy that classifies data sensitivity and lists approved tools by data level. Make the policy practical and widely visible.
  • Roll out a mandatory summarization standard for any user or unit that needs to submit records to consumer chatbots. Provide templates and training modules that teach staff how to produce a succinct, de‑identified summary.
  • Negotiate enterprise contracts that explicitly exclude training on institutional data and grant audit/log access; require breach notifications and specify retention windows.
  • Integrate DLP, SSO and conditional access so that uploads of sensitive content to consumer sites are blocked or generate automated alerts.
  • Run regular tabletop exercises that cover an AI‑specific incident: accidental disclosure of student grades, an LLM that fabricates a policy, or a legal request for historical logs.
  • Fund pilot projects for on‑prem or private‑cloud model deployments for units with research or compliance demands, and measure ROI before scaling.
  • Communicate to the campus community: provide clear “what not to do” rules for students and staff, paired with recommended alternatives (institutional Copilot, summarized inputs, or manual review).
These are not theoretical actions; many campuses are already moving down this path and documenting playbooks for others to adapt.

Strengths of the approach and where it falls short​

Notable strengths​

  • Practicality: Zangs’s recommendation is immediately actionable and cheap to deploy: summarization and de‑identification are skills that staff can adopt today and scale quickly.
  • Risk reduction: Data minimization reduces the attack surface for both privacy breaches and model poisoning while preserving access to AI‑driven insights.
  • Policy alignment: The practice aligns with FERPA and institutional privacy expectations and complements procurement and technical controls.

Limitations and residual risks​

  • Human error and effectiveness: Summaries depend on human judgment. Poorly written or incomplete summaries can introduce bias, omit context or still leak sensitive clues.
  • Operational overhead: Creating summaries and routing them through approval workflows introduces friction that some units will resist, especially in high‑velocity environments.
  • Not a substitute for governance: Summarization is a mitigation, not a replacement for contractual protections, secure architectures or monitoring. Institutions that rely solely on operational hygiene without hard vendor guarantees are exposed to legal and technical surprises (for example, retention due to litigation).
  • Re‑identification risk remains: Even de‑identified data can sometimes be re‑identified when combined with auxiliary datasets. The “reasonable determination” standard in FERPA and similar frameworks requires institutions to consider the realistic likelihood of re‑identification.

A template policy snippet campuses can adapt right now​

Below is a short, usable policy paragraph designed for immediate inclusion in a campus AI policy or syllabus:
  • “When using third‑party generative AI tools that are not institutionally approved, staff and faculty must not submit personally identifiable or protected student or employee records. For tasks that require analysis of such records, users must produce a short, de‑identified summary (no names, unique identifiers, or verbatim quotes) and submit only the summary to external models. Institutional enterprise AI products that have undergone privacy and security review may be used in accordance with their approved data classification levels. All exceptions require written approval from the Office of Privacy or equivalent authority.”
This kind of short, enforceable language transforms Zangs’s tactical advice into policy that can be operationalized across campus.

Final analysis — balancing agility and safety​

Jena Zangs’s counsel is emblematic of how responsible AI adoption will proceed in higher education: decentralized experiments constrained by centralized governance. Summarization and de‑identification give administrators a fast, low‑tech way to harness AI’s productivity gains while the institution negotiates contracts, builds secure deployments and trains personnel. It is a risk‑based compromise that recognizes the reality of mixed tool use on campuses.
However, universities must not let the ease of summarization lull them into complacency. Contracts, DLP, identity controls, threat modeling for model poisoning and a culture of verification are indispensable complements. As courts and regulators continue to shape what vendors can do with data, institutions must retain layered technical protections and maintain auditability. Institutional resilience will come from people, process and technology working in concert: sensible summarization practices anchored by robust procurement, identity controls, monitoring and training.
In short: summarize before you upload, but don’t stop there. Build the governance, legal assurances and technical scaffolding so that summarization becomes one component of a durable, auditable approach to AI safety on campus.
Conclusion
Zangs’s recommendation is not merely a trick for cautious administrators — it’s a pragmatic, legally informed, immediately deployable control that reduces exposure and preserves institutional decision‑making. Paired with enterprise contracts, identity and DLP controls, and a governance framework, the summarization habit helps universities grasp the productivity benefits of generative AI without surrendering control of the records and people they exist to serve.

Source: Newsroom | University of St. Thomas In the News: Jena Zangs on AI Safety Within Academic Institutions - Newsroom | University of St. Thomas
 

Back
Top