Concentrated Enterprise AI Risk: ChatGPT Drives 71.2% of Data Exposures

  • Thread Author
Harmonic Security’s analysis of 22.4 million generative‑AI prompts in 2025 reveals a stark concentration of enterprise data exposure: six applications account for more than 92% of measured potential leakage, and ChatGPT alone drives roughly 71% of those exposures despite representing less than half of total prompts — a pattern that reframes how security teams should prioritise controls and highlights urgent design and governance gaps across AI integrations.

Neon teal AI governance dashboard showing charts, audit trails, and device icons.Background​

Generative AI moved from novelty to ubiquity in corporate workflows during 2024–2025, delivered through three broad channels: tenant‑managed enterprise copilots (with DLP and audit trails), embedded AI features inside SaaS products, and unrestricted consumer chat services accessed from corporate endpoints. The last of these — consumer chatbots and free model tiers — has emerged as an outsized driver of measured exposure because of zero‑friction access, clipboard/paste habits, and unmanaged personal accounts. Two concurrent technical developments sharpened this risk picture in January 2026. First, a practical proof‑of‑concept called “Reprompt” demonstrated how ordinary UX conveniences in Copilot Personal (prefilled deep links, session continuity and server‑driven follow‑ups) could be chained into a single‑click exfiltration pipeline. Second, telemetry analyses show that a tiny group of popular consumer GenAI services — led by ChatGPT — now accoun observed in enterprise environments. Together these events make clear that convenience features without persistent governance are the principal structural weakness.

Overview of the Harmonic findings and corroborating telemetry​

What the numbers say​

Harmonic Security analyzed 22.4 million prompts and concluded that six applications were responsible for 92.6% of potential data exposure in their dataset. ChatGPT was identified as the largest single contributor, responsible for 71.2% of measured exposures while representing 43.9% of prompts. Microsoft Copilot and Google Gemini showed higher proportions of exposure relative to their usage but at far smaller absolute volumes. Harmonic also reported 98,034 instances it classified as sensitive, of which 87% occurred via ChatGPT Free accounts. Independent vendor telemetry and industry reports show the same directional trend: generative‑AI policy violations and data exposures have surged, with public and unmanaged AI tools commonly implicated. Netskope’s Threat Labs and other vendors document a rapid rise in month‑to‑month incidents and emphasise that a substantial percentage of violations involve regulated personal data or intellectual property. These patterns confirm Harmonic’s core insight: a concentrated toolset drives most measured enterprise AI exposure.

Why this distribution matters​

The uneven risk distribution has major operational implications. If a small set of tools produces the majori, then targeted governance — not blanket prohibition — can yield rapid risk reduction while preserving productivity. That is the central pragmatic takeaway Harmonic offers: focus first on the “big six” apps, then work down the long tail with fine‑grained controls and behavioural nudges. This prioritisation is operationally attractive for security teams already stretched thin.

Anatomy of the dominant leakage vectors​

Clipboard/paste workflows and ephemeral data loss​

The most common human behaviour that leads to leakage is simple: copy, paste, ask. Employees routinely paste code snippets, contract language, M&A notes, or customer records into chat windows to get fast answers. These ephemeral clipboard events occur on the client and often escape traditional DLP systems that focus on file repositories, email gateways, or network egress. Because the text is unstructured and contextually rich, simple pattern rules miss it; semantic classification is required.

Unmanaged personal and free accounts​

Personal accounts bypass enterprise SSO, MFA, retention guarantees and non‑training contracts. Harmonic found that a high fraction of sensitive instances came from free tiers — most notably ChatGPT Free — where enterprises have no contractually enforceable privacy guarantees. That means data pasted into those sessions may be used to improve public models unless explicitly excluded by the provider. The risk here is both regulatory and competitive: training data contamination, IP loss, and non‑compliance with data residency or sectoral rules.

Browser extensions, widgets and ambiguous origins​

Extensions and embedded quest broad page‑level permissions that can capture DOM content, clipboard events, and cross‑origin data. Those client‑side agents blur provenance: when a third‑party extension forwards page text to an LLM, network allowlists and CASBs may struggle to differentiate whether the traffic originates from a sanctioned enterprise API or an unsanctioned consumer page. This creates blind spots that attackers and well‑meaning users alike can exploit.

Parameter‑to‑prompt (P2P) injection and Reprompt mechanics​

Reprompt exposes a fundamental design risk: many assistant web UIs accept query parameters that prepopulate the input box. Researchers demonstrated that an attacker can embed malicious instructions in those parameters, causing an authenticated assistant session to run attacker‑supplied prompts. The full chain uses three simple building blocks — parameter injection, a repetition/“do it again” bypass that circumvents single‑shot redaction, and server‑driven follow‑ups that fragment exfiltration into many small, low‑volume transfers. Because most of the traffic in these flows appears as normal vendor egress, standard network monitoring can miss it entirely. That is why even well‑resourced organisations must pair patching with architectural controls.

The enterprise impact: business, legal and security consequences​

Loss of intellectual property and competitive data​

The most concrete and immediate impact is loss of proprietary code, product roadmaps, M&A details and financial forecasts. Harmonic’s breakdown shows code accounted for about 30% of exposures and legal dialogue for 22.3%, with M&A and financial projections also strongly represented. This is the sort of information that can harm valuation, give competitors an advantage, and create long‑lasting legal exposure.

Regulatory, compliance and data protection risk​

Sending regulated personal data to external models can trigger GDPR, HIPAA, PCI and other sectoral obligations. When employees use free or personal accounts, the enterprise often lacks contractual assurance about data processing, retention, and training usage. Regulators are watching how businesses use AI; unaddressed leakage creates audit and enforcement risk. Vendor assurances matter — but contractual controls and technical enforcement.

Elevated attack surface and fraud risk​

Agentic features (in‑chat commerce, booking, and payment flows) convert conversational assistants into value transfer conduits. When an assistant can order goods, transfer value or access payment APIs, the consequences of compromise expand from mere data leakage to forized transactions and supply‑chain manipulation. Attackers now have multiple rails — identity takeover, prompt injection and third‑party API abuse — to monetise.

playbook for IT and security teams​

The evidence supports a layered, pragmatic approach that balances risk reduction with continued employee productiecommendations are sequenced by immediacy and impact.

Short term (hours–days): triage and containment​

  • Inventory active GenAI touchpoints: map web UIs, browser extensions, embedded features and reported personal account use across your estate.
    2.conditional access for any corporate AI console or admin UI; disable consumer copilots on managed devices where tenant governance isn’t available.
  • Apply vendor patches and mitigations for confirmed vulnerabilities (e.g., Reprompt/Copilot fixes) and validate deployments across pilot rings.

Medium term (weeks–months): behavioural controls an​

  • Deploy browser‑level nudges and contextual warnings that intercept paste events and require explicit confirmation before sending potentially sensitive text to external models.
  • Integrate semantic DLP into API gateways and agent runtimes; move beyond regex to embeddings‑bnd masking for PII, IP and finance data.
  • Build an “AI inventory” dashboard that tracks DAU/MAU by tool and department and ties risky behaviour to remediation workflows (revoke OAuth grants, quarantine tokens).

Long term (architecture and procurement)​

  • Treat models and agents as identities: assign least privilege, ephemeral credentials, and EXTRACT permissions for sensitive reads. Maintain immutable audit trails that correlate natural‑language prompts to downstream API reads.
  • Prefer tenant‑managed, non‑training enterprise plans or on‑prem/hosted retrieval‑augmented generation (RAG) setups for high‑sensitivity workloads so retention and training exclusions are contractually enforced.
  • Reassess procurement to require explicit data‑use guarantees from AI vendors (no training on customer inputs without reach and training‑data audit rights in contracts.

Vendor and platform responsibilities — a candid appraisal​

Providers must do more than publish policy pages: they must build features ae enterprise‑grade guarantees enforceable in practice. Practical vendor responsibilities include:
  • Clear account classification mechanisms so gateways and CASBs can distinguish free/personal sessions from tenant‑managed accounts in telemetry.
  • Built‑in paste/button interceptors and client‑side redaction options that can be centrally managed by enterprises.
  • Faster, more transparent disclosures for design‑class vulnerabilities (e.g., P2P injection) and robust mitigations that remove the attack surface rather than relying solely on patch cycles.
Vendors also need workable enterprise controls for in‑chat commerce and agentic actions: explicit consent flows, auditable transaction confirmation, and escrowed payment channels to limit fraud and liability exposure. Without the will continue to amplify risk.

Critical analysis: strengths, trade‑offs and remaining blind spots​

Notable strengths of the Harmonic analysis and broader reporting​

  • Scale: Analyzing 22.4 million prompts provides macro visibility into usage patterns that smaller surveys miss. The skewed distribution is a repeatable pattern across vendor telemetry and independent studies, which strengthens the finding’s credibi.com.au](]) [*]Actionability: The “big six” ...ithout triggering any visible security alerts tail governance fatigue: Blocking AI domains can cause major friction (Canva, Grammarly, Translate) and lead to policy circumvention. Effective governance requires enabling safe, audited AI experiences — not blanket bans.

What Windows and enterprise IT teams should do next​

  • Start with targeted triage: identify the “big six” GenAI apps in your telemetry and apply the short‑term containment steps above. This delivers fast risk reduction and buys time to implement richer controls.
  • Instrument the browser: for Windows endpoints, focus on managed browser policies, extension whitelists, and paste interception agents that can be deployed at scale through Group Policy, Intune or third‑party EDR agents.
  • Build an AI governance opsloop: create an interdisciplinary team (security, legal, procurement, engineering) to approve AI suppliers, define allowed use cases, and maintain an AI inventory tied to incident response runbooks.

Conclusion​

The Harmonic analysis — corroborated by multiple vendor reports and independent telemetry — reframes enterprise generative‑AI risk as a concentration problem: a small set of consumer‑grade LLMs, prominently ChatGPT, now account for a disproportionate share of observed data exposure. That concentration creates both opportunity and urgency: with targeted governance, enterprises can materially reduce exposure quickly, but doing so requires a shift in controls from file‑centric DLP to session‑aware, semantic, and identity‑aware enforcement. Technical fixes (patching Reprompt‑style vectors), behavioural controls (paste nudges and OAuth governance), and contractual rigor (no‑training guarantees, audit rights) are all necessary. Most importantly, organisations must adopt a layered strategy that preserves AI productivity while treating models and agents as first‑class objects in their security posture — otherwise convenience will continue to outpace control, and the next exfiltration will be easier to hide than to detect.
Source: SecurityBrief Australia https://securitybrief.com.au/story/chatgpt-drives-bulk-of-enterprise-generative-ai-data-risk]
 

Back
Top