AI browser assistants are quietly sweeping up private, sensitive information from pages users assume are off-limits — including medical records, bank details, academic transcripts, and even social security numbers — according to a new cross‑national audit of the most popular generative-AI browser extensions and integrations. (arxiv.org) (euronews.com)
AI-powered browser assistants — the extensions, overlays and built‑in agents that promise faster search, on‑page summaries, and one‑click task automation — have become mainstream in the past two years. They are marketed as productivity boosts for web research, shopping, coding, and content creation, and they increasingly run as small pieces of code inside Chrome, Edge, and other Chromium‑based browsers. But the convenience of surfacing contextually relevant answers inside the browser comes with an under‑examined cost: these assistants often process page content on remote servers and, in doing so, can exfiltrate sensitive data from authenticated or password‑protected sites. (arxiv.org)
A team of researchers based in the UK and Italy built a repeatable audit to test how ten of the most popular generative‑AI browser assistants behave during routine browsing tasks — from public web searches to authenticated sessions on bank portals, university health systems, and tax sites. Their approach combined network traffic analysis with a controlled prompting framework that simulated typical user behavior and follow‑up questions, letting the team observe what was sent from the browser to the assistants’ servers and to third‑party trackers. (arxiv.org)
At the same time, industry actors must also respond to leadership signals: OpenAI’s CEO has publicly warned users against treating chatbots as therapist or legal counsel and stressed the lack of legal confidentiality for conversations with AI. That public admission underscores the urgency of clarifying users’ rights and companies’ obligations. (techcrunch.com)
For Windows users: treat AI browser assistants as powerful but potentially intrusive tools. Disable or limit them on sensitive sites, prefer assistants with clear privacy‑preserving architectures, and demand that vendors explain in plain language what they collect and why. Policymakers and platform owners must move quickly to mandate transparency and to protect the confidentiality expectations people reasonably have when they use medical portals, educational systems, or financial services.
The study’s authors and several news outlets make a consistent point: AI that quietly “sees everything you do online” is a privacy problem, not just a feature. The good news is that technical fixes and stronger policy choices can narrow the risk — but only if companies, browser platforms, and regulators act before more sensitive data becomes irrevocably distributed across analytics and AI stacks. (arxiv.org, euronews.com, theregister.com)
Conclusion
The audit of generative‑AI browser assistants exposes a gap between user expectations and product behavior: many assistants collect or forward more browsing context than users realize, sometimes from authenticated pages that contain highly sensitive information. The study’s findings make a clear case for immediate remedial action — from users auditing their extensions to vendors adopting privacy‑first defaults and regulators clarifying obligations. Absent that action, the productivity gains of in‑browser AI risk coming at the expense of the privacy and security of the very people who rely on these tools. (arxiv.org, euronews.com)
Source: CryptoRank https://cryptorank.io/news/feed/3cf6c-chatgpt-and-ai-assistants-track-users-data/
Background
AI-powered browser assistants — the extensions, overlays and built‑in agents that promise faster search, on‑page summaries, and one‑click task automation — have become mainstream in the past two years. They are marketed as productivity boosts for web research, shopping, coding, and content creation, and they increasingly run as small pieces of code inside Chrome, Edge, and other Chromium‑based browsers. But the convenience of surfacing contextually relevant answers inside the browser comes with an under‑examined cost: these assistants often process page content on remote servers and, in doing so, can exfiltrate sensitive data from authenticated or password‑protected sites. (arxiv.org)A team of researchers based in the UK and Italy built a repeatable audit to test how ten of the most popular generative‑AI browser assistants behave during routine browsing tasks — from public web searches to authenticated sessions on bank portals, university health systems, and tax sites. Their approach combined network traffic analysis with a controlled prompting framework that simulated typical user behavior and follow‑up questions, letting the team observe what was sent from the browser to the assistants’ servers and to third‑party trackers. (arxiv.org)
What the audit found — the headline results
- The researchers examined ten leading extensions and browser assistants and found that nine out of ten transmitted private content to remote servers during routine use. Perplexity’s extension or assistant was the notable outlier during these tests. (arxiv.org) (euronews.com)
- Several assistants routinely sent full page content (HTML DOMs) or large extracts of visible content to their first‑party servers. In multiple cases this included content from authenticated, password‑protected pages. (arxiv.org)
- Merlin (a popular Chrome extension) was singled out for collecting form‑input values, meaning it captured data entered into web forms — specifically, researchers say Merlin recorded a social security number entered on a U.S. tax portal and similarly sensitive inputs on bank and health pages. (arxiv.org) (theregister.com)
- Several assistants sent user prompts and identifiers (including IP addresses and session chat IDs) to third‑party analytics platforms such as Google Analytics, enabling potential cross‑site tracking and retargeting. Assistants named in the audit for these behaviors include Sider and TinaMind. (arxiv.org) (techxplore.com)
- Some assistants generated persistent profiles that inferred age, gender, income level and interests and used those attributes to personalize responses across browsing sessions. The research found profiling signals in assistants including ChatGPT integrations, Copilot, Monica, and Sider. (arxiv.org)
- In a handful of cases, chat histories and logs persisted in the browser or in background services after sessions ended, suggesting the data footprint may outlive the user’s immediate interaction. Copilot and some ChatGPT integrations were highlighted here. (arxiv.org)
How the data leakage happens — a technical breakdown
Client scripts, content scripts and the DOM
Browser assistants typically inject a small script into web pages (a content script) to enable features like “summarize this page” or “answer questions about the current page.” When a content script is triggered, it has access to the page’s DOM and to any text that is currently visible. If the assistant is implemented as a thin client that forwards page content to a server for model inference, that DOM or page text can be uploaded in whole or in part. The audit observed multiple assistants doing exactly this. (arxiv.org)Background workers and auto‑invocation
Some assistants use background service workers that can be invoked automatically (without an explicit user action) upon navigation or search events. That design enables features like context retention across tabs but also means the agent can send page content to servers without an obvious on‑page prompt. The researchers observed auto‑triggered calls on several assistants. (arxiv.org)Server‑side processing vs local inference
Rather than running heavyweight language models locally in the browser, most assistants forward queries and page data to server‑side APIs where the heavy lifting happens. Server‑side inference simplifies development and reduces client resource needs — at the cost of sending user content over the network. The audit found server‑side processing to be the predominant architecture among tested assistants. (arxiv.org)Third‑party analytics and cross‑site linking
The audit found that some assistants forward user prompts and identifiers to analytics endpoints (e.g., Google Analytics, Mixpanel). When chat IDs, timestamps, or raw prompts are shared with ubiquitous trackers, those events become linkable to a wider tracking graph, enabling targeted advertising or cross‑site profiling. This detail is especially alarming when combined with form values or protected health information. (arxiv.org)Legal and regulatory implications
The researchers flagged potential violations of U.S. and EU privacy regimes. Two important legal vectors stand out:- Health data: The transmission of protected health information (PHI) from authenticated health portals to third‑party servers during the audit could violate U.S. federal privacy rules (HIPAA) when covered entities or business associates are involved. The paper recommends careful legal review where assistants capture or forward PHI. (arxiv.org)
- European data protection: The researchers concluded that, on their face, some of the assistants’ behaviors are likely inconsistent with the General Data Protection Regulation (GDPR) — particularly when personal data is sent outside the EU and when profiling occurs without adequate lawful bases, transparency, or data‑minimization. That conclusion is framed as legal risk rather than a court finding; regulators would need to undertake formal investigations. (euronews.com, arxiv.org)
Vendor practices, disclosure and public statements
The audit juxtaposed observed network activity with vendors’ published policies. In many cases, the privacy notices for extensions already disclose broad data collection and sharing practices — but the prominence and clarity of those disclosures are often limited, leaving ordinary users unaware of the operational details.- Merlin’s EU/UK privacy policy explicitly lists broad categories of data it may collect (names, contact details, credentials, transaction and payment info, and typed inputs) and states those data may be used for personalization and product improvement. The audit shows Merlin’s behavior aligns with the broad collection language — but many users will not read or understand the practical implications. (euronews.com)
- Sider’s policy similarly discloses the use of partners like Google, Cloudflare and Microsoft in operating its services; the audit observed that Sider and similar assistants indeed send data to analytics providers. (arxiv.org, euronews.com)
- OpenAI’s public statements and terms say data from UK and EU users may be stored outside those regions, while asserting that user rights remain intact. Those kinds of jurisdictional moves can complicate cross‑border data protection obligations under the GDPR. Meanwhile, OpenAI’s own leadership has publicly acknowledged that conversations with ChatGPT currently lack therapist‑style legal confidentiality protections — a caution that reinforces the need for care when users input sensitive data. (euronews.com, techcrunch.com)
Strengths: why people use these assistants and what they deliver well
It’s important not to lose sight of the real value these tools deliver. The same architectural decisions that raise privacy flags also enable powerful features:- Speed and convenience: Server‑side models produce quick, synthesized answers and summaries that meaningfully reduce research time.
- Accessibility and usability: Conversational interfaces and summarize‑on‑page features lower barriers for users who struggle with long articles, dense PDFs, or complex search tasks.
- Cross‑tab context and workflow automation: By accessing multiple tabs and remembering context, assistants can act as lightweight personal agents — automating repetitive tasks and aggregating information across sources. Windows and Edge integrations have shown how productivity gains can be significant for knowledge workers. (techxplore.com)
Risks and harms: beyond privacy to safety and fairness
The combination of personal data collection, profiling, and server‑side personalization creates a set of downstream risks:- Targeted exploitation: When sensitive data is combined with profiling signals, it can be used for targeted scams or manipulative advertising.
- Legal exposure for users: As OpenAI’s CEO has warned publicly, conversations with AI systems currently lack the legal privilege of doctor‑patient or attorney‑client confidentiality; uploaded personal details could in theory be produced in legal proceedings. (techcrunch.com)
- Regulatory exposure for vendors: Companies that process health, education, or financial data without appropriate safeguards risk enforcement actions, fines, and litigation.
- Unintended model leakage: Stored prompts and chat identifiers can make it possible to reconstruct user interactions from logs, increasing the risk of exfiltration in the event of a breach.
- False sense of privacy: Private browsing modes or “incognito” mental models do not prevent extension content scripts from reading page content. Users often assume private modes prevent data sharing; the audit shows that assumption can be dangerously wrong. (arxiv.org)
Limitations of the audit and what it does not prove
The researchers were methodical, but no lab audit is the final word. Key caveats:- The experiments were performed under controlled conditions and with particular versions of extensions and browsers. Vendors can and do update their code and privacy posture; observed behavior may change over time.
- The study establishes what data was transmitted during those tests; it does not prove that vendors misused the data or that downstream recipients retained or monetized it beyond reasonable product operations.
- Legal conclusions in the paper are framed as likely breaches or incompatibilities under certain laws; definitive legal violations require regulator assessments or court rulings. Readers should view the legal commentary as risk signaling, not final adjudication. (arxiv.org)
Practical guidance: what Windows and browser users should do now
For Windows users and anyone relying on Chromium‑based browsers, immediate steps can reduce exposure while preserving useful assistant features where needed.- Check and audit installed extensions:
- Open your browser’s extensions page and remove or disable any AI assistants you don’t actively use.
- Review permissions for remaining extensions and revoke access to sites where sensitive work occurs (banking, health portals, tax sites).
- Use per‑site control:
- For Chrome/Edge, set extensions to “click to run” (or “only on specific sites”) when possible so content scripts are not injected automatically on sensitive pages.
- Prefer privacy‑friendly alternatives:
- The audit found Perplexity to be significantly more privacy‑respectful in these tests; consider using assistants that explicitly avoid uploading page content or that perform server‑side fetches without seeing authenticated content. (arxiv.org, euronews.com)
- Avoid pasting or entering sensitive data into assistant prompts:
- Never paste passwords, SSNs, medical records, or full account numbers into a general chat prompt.
- Use dedicated apps for sensitive tasks:
- Use the bank’s official app or website without extensions; use official health portals with caution and avoid opening AI assistants while in authenticated sessions.
- Consider local LLMs for high‑sensitivity workflows:
- If budget and expertise permit, run a local model or an on‑prem system to keep data in your environment rather than sending it to third‑party servers.
- Review vendor privacy policies — critically:
- Policies often disclose broad data collection; the practical differences are in implementation details and default behaviors. When a policy promises “we do not sell data” but the extension still shares prompts with analytics endpoints, that gap matters. (euronews.com, arxiv.org)
What vendors and browser makers should do next
The audit provides a clear roadmap for safer AI browsing if companies choose to follow it:- Default to data minimization: Do not collect full DOMs or form inputs by default. Only request the minimal content required for a feature and do so with explicit, contextual consent.
- On‑device inference where feasible: Move privacy‑sensitive features to client‑side models when possible, or offer an explicit “local‑only” mode.
- Transparent, machine‑readable disclosures: When an assistant will send page content to servers, show a one‑click banner explaining exactly what will be transmitted, where it will be stored, and how long it will be retained.
- Opt‑in profiling and simple deletion controls: Profiling should be opt‑in, with easy controls that delete all associated profiles and logs on demand.
- Independent audits and certifications: Commission third‑party privacy audits and publish methodologies so regulators, journalists and researchers can validate compliance claims.
- Segregate analytics from content flows: Avoid linking raw prompts and chat identifiers to broad analytics pipelines that can enable cross‑site tracking.
Policy and regulatory outlook
Regulators are already paying attention. European privacy authorities, U.S. state attorneys general, and sectoral regulators (health, education) have shown heightened scrutiny of AI and data processing. The patterns the audit uncovers — cross‑border data flows, profiling without clear bases, PHI exfiltration in test scenarios — are precisely the sort of concerns that trigger investigations under the GDPR and sectoral U.S. rules. Companies should expect increased pressure to explain logs, retention, and lawful bases for processing. (euronews.com, arxiv.org)At the same time, industry actors must also respond to leadership signals: OpenAI’s CEO has publicly warned users against treating chatbots as therapist or legal counsel and stressed the lack of legal confidentiality for conversations with AI. That public admission underscores the urgency of clarifying users’ rights and companies’ obligations. (techcrunch.com)
Final analysis and takeaway
The audit is a wake‑up call for anyone who assumed browser “privacy” modes, or the act of being logged into a site, automatically shielded data from third‑party assistants. The core technical tension is simple: server‑side models enable fast, useful features but require data to travel off a user’s device. If vendors do not adopt stronger safeguards — explicit consent, data minimization, local processing options, and rigorous separation of analytics pipelines — users’ most sensitive interactions will continue to be at risk.For Windows users: treat AI browser assistants as powerful but potentially intrusive tools. Disable or limit them on sensitive sites, prefer assistants with clear privacy‑preserving architectures, and demand that vendors explain in plain language what they collect and why. Policymakers and platform owners must move quickly to mandate transparency and to protect the confidentiality expectations people reasonably have when they use medical portals, educational systems, or financial services.
The study’s authors and several news outlets make a consistent point: AI that quietly “sees everything you do online” is a privacy problem, not just a feature. The good news is that technical fixes and stronger policy choices can narrow the risk — but only if companies, browser platforms, and regulators act before more sensitive data becomes irrevocably distributed across analytics and AI stacks. (arxiv.org, euronews.com, theregister.com)
Conclusion
The audit of generative‑AI browser assistants exposes a gap between user expectations and product behavior: many assistants collect or forward more browsing context than users realize, sometimes from authenticated pages that contain highly sensitive information. The study’s findings make a clear case for immediate remedial action — from users auditing their extensions to vendors adopting privacy‑first defaults and regulators clarifying obligations. Absent that action, the productivity gains of in‑browser AI risk coming at the expense of the privacy and security of the very people who rely on these tools. (arxiv.org, euronews.com)
Source: CryptoRank https://cryptorank.io/news/feed/3cf6c-chatgpt-and-ai-assistants-track-users-data/
Last edited: