ChatGPT Projects: Realistic AI Workspaces and Productivity Risks

ChatGPT · 2025-10-10T13:52:57-0400

OpenAI’s decision to make ChatGPT “Projects” available to free users is a small-but-significant turning point in how firms and knowledge workers organise AI-driven work — but it arrives at a moment when a growing body of research suggests that generative AI is not the simple, across‑the‑board productivity booster many expected. The Projects rollout — which adds folder‑style workspaces, per‑project custom instructions, and tiered file‑upload limits — gives teams a practical tool for organising context and files inside ChatGPT, yet wider evidence from government trials, academic surveys and randomized studies shows the real gains from AI are highly conditional and sometimes offset by new kinds of overhead and risk.

Background: what changed with ChatGPT Projects

OpenAI’s Projects feature is designed to let users group chats, upload reference files and set custom instructions and memory controls at the project level. The company recently expanded access so that it is available to free ChatGPT accounts, while also increasing the number of files users can attach per project — up to 5 files for free users, 25 for Plus subscribers and 40 for Pro/Business/Enterprise tiers. The update also introduced cosmetic and usability tweaks such as customizable icons and color choices, and the feature is live on the web and Android with iOS support rolling out soon. These product changes were announced in early September 2025 as part of a broader push to make collaboration and contextual continuity easier inside ChatGPT.
Why this matters in practice: Projects is essentially a lightweight workspace that bundles context, files and instructions where a single chat history would otherwise be fragile or dispersed. For small teams, consultancies and accountants who already rely on ChatGPT as a drafting and research assistant, Projects promises to reduce context switching and give the assistant clearer guardrails about tone, role, and scope. But the feature’s capacity limits, memory controls and per‑project isolation make it a freemium nudge: it showcases higher‑value workspaces to free users while reserving heavier—higher‑volume—use for paying customers.

The AI productivity paradox: what recent evidence is telling us

A surge of adoption, but uneven measurable value

Across industries, executives and practitioners describe a rapid shift from “experimenting with” to “deploying” generative AI. However, adoption metrics — seats provisioned, prompts sent, integrations built — are not the same as measured productivity gains. Large‑scale trials and independent field experiments now show a nuanced pattern: AI helps in tightly scoped, repetitive tasks but often fails to produce measurable enterprise‑level gains when rolled out broadly without workflow redesign, training and governance.
Two recent and influential examples illustrate this tension.

A government‑scale Copilot experiment run by the UK Government Digital Service (GDS) with Microsoft 365 Copilot involved roughly 20,000 civil servants over a three‑month period and reported average self‑reported time savings (about 26 minutes per day). The trial press messaging highlighted positive user sentiment and lower time spent on routine tasks, but careful reading of the report shows the outcomes are mixed: time saved on some activities did not conclusively translate into departmental‑level productivity gains, and different evaluation methods produced different conclusions about net value.
Independent randomized experiments in software engineering have produced the opposite surprise: a METR field study found experienced open‑source developers were on average 19% slower when using contemporary AI coding assistants for maintenance tasks — a slowdown driven by time spent prompting, waiting for generations, and reviewing imperfect outputs. That finding runs counter to both vendor claims and developer sentiment (participants believed they were faster), and it underscores how perceived productivity can diverge sharply from measured throughput.

Workslop: a new term for an old problem, amplified

A high‑profile collaboration between BetterUp Labs and Stanford’s Social Media Lab introduced the term “workslop” to describe polished‑looking but low‑substance AI outputs that require human rework. The researchers’ survey of full‑time desk workers found that about 40% had received workslop in the prior month and that each incident cost roughly two hours of recipient time on average — a burden researchers quantify as roughly $186 per employee per month when aggregated across salary estimates. The effect is social as well as operational: recipients of workslop judge senders more harshly, and repeated incidents erode team trust and morale.
Taken together, these studies show a clear pattern: AI can speed clearly defined, high‑volume tasks, but it also generates new downstream costs — verification, rework, cognitive load, and coordination overhead — that often go unpaid in optimistic vendor demos.

Why that paradox happens: five structural causes

Surface polish without domain fidelity. LLMs are engineered to produce fluent language, not to authenticate facts or preserve nuanced domain logic. The result: outputs that look right but lack substance or accuracy, especially in regulated or expert contexts.
Verification and “speed‑but‑verify” overhead. Faster first drafts increase the volume of artifacts that must be checked; when each AI draft requires human review the net time saved can evaporate. Real‑world pilots often overlook the verification burden when modelling ROI.
Fragmented workflows and integration bottlenecks. Speeding one step in a chain without redesigning downstream processes simply shifts the bottleneck; e.g., faster code generation does not shorten releases if CI/tests and code review remain unchanged.
Learning curves and uneven adoption. Gains are concentrated among users who receive targeted onboarding, prompt engineering guidance and role‑specific templates. Without training, the same tools that save time for newbies can slow down experienced practitioners who must review and correct AI drafts.
Shadow AI and data leakage. When sanctioned tools are slow, employees resort to consumer models and personal accounts, exposing intellectual property and sensitive data to services that may log or retain prompts. That pattern multiplies downstream risk even as it masks the true cost of failed enterprise deployments.

What Projects changes — and what it won’t

Practical strengths of ChatGPT Projects

Context bundling. Projects help keep reference files, prompt templates and role instructions together so the assistant can operate with clearer, project‑specific guardrails. This reduces repeated prompt composition and helps newer collaborators onboard faster.
Project‑level memory controls. Allowing memory to be scoped per project reduces cross‑project leakage and makes it easier to define what the assistant should remember for a given workflow. This is valuable for multi‑client teams such as accounting firms that must keep contexts discrete for compliance.
Freemium distribution. By offering Projects to free users with modest file limits, OpenAI lowers the trial cost for teams and small businesses, increasing the chance they will adopt project practices before deciding on a paid tier. That encourages experimentation in a controlled way.

Limits and what Projects does not solve

Quality assurance remains manual. Projects do not remove the need to verify facts, citations, or regulatory conformity; they only make it easier to supply the assistant with reference documents. Human review and controlled workflows are still mandatory for regulated or high‑stakes outputs.
File limits matter for scale. The tiered upload caps (5/25/40) mean Projects is useful for document‑heavy micro‑projects but not for large‑scale knowledge bases or enterprise content stores; organisations that need broad model grounding will still require dedicated connectors and private model deployments.
Governance and audit trails still require policy. Cosmetic controls (icons/colors) and memory toggles are helpful UX features, but they do not substitute for data classification, access controls, logging, or contract terms that define data use, retention and liability.

Practical guidance for IT leaders, managers and accounting firms

Redefine the ROI metric: move from seat counts to net throughput

Measure end‑to‑end cycle time for a business process (draft → review → publish), not just prompt counts.
Track rework time attributable to AI outputs and include it in cost models.
Pilot with clear primary metrics (e.g., ticket resolution per hour, invoice processing cycle time) and conservative assumptions about verification overhead.

Design pilots around tasks that match AI strengths

High‑volume, rule‑based tasks (invoice triage, OCR classification, first‑pass ticket summarization) are where AI nearly always returns measurable gains.
Avoid using early pilots to chase speculative top‑line wins; focus on back‑office efficiency and measurable throughput improvements first.

Create practical governance that preserves experimentation

Offer a sanctioned, enterprise‑grade project workspace or tenant before employees resort to consumer tools; make the approved path faster and more convenient than the unsafe path.
Require disclosure and human sign‑off for automated drafts that feed into decisions or customer communications. Cabonne Council’s draft policy and other local government approaches illustrate this middle path.
Maintain data classification rules that forbid uploading PII or sensitive IP to consumer models; use Projects’ per‑project memory controls to limit data exposure where possible.

Upskill neutrally and measure skill retention

Train staff on how to prompt and how to verify AI outputs; include exercises in spotting workslop and in reconstructing answers from primary sources.
Monitor for deskilling: if routine tasks are fully automated, ensure rotation and applied learning so institutional knowledge does not erode.

Risk checklist: what to audit before expanding usage

Data leakage risk: Are prompts and files stored by third‑party models? Are enterprise contracts explicit about retention and reuse?
Auditability: Can you reconstruct the model’s inputs and the assistant’s outputs for regulatory review?
Dependency and resiliency: If developers rely on AI‑generated code or junior staff rely on AI for client work, do you have processes to preserve tacit expertise?
Vendor claims verification: Do you validate vendor ROI claims with your own randomized or comparative trials?
Human‑in‑the‑loop controls: Is there a clear, testable rule for when outputs require senior sign‑off?

Flagging uncertain claims: Where studies and vendor materials conflict, treat the less‑replicated claim with caution. For example, aggregated vendor case studies that report “large time savings” should be tested against independent field trials (as the METR developer trial and the GDS Copilot experiment demonstrate). When high‑stakes accuracy matters — legal, tax, clinical, financial modelling — default to conservative, auditable workflows until your own pilots show repeatable, instrumented benefit.

The strategic trade‑off for CIOs and practice leaders

ChatGPT Projects is an incremental but meaningful product improvement: it reduces friction for organising context, encourages repeatable templates and nudges small teams to use best practices. For professional services firms and accounting practices that already use ChatGPT day‑to‑day, Projects can cut prompt‑repetition and help enforce client‑specific instructions inside the workspace. But the broader lesson from cross‑sector research is that tools alone do not produce transformation; transformation requires a deliberate combination of:

Workflow redesign to remove downstream bottlenecks;
Measurement frameworks that capture rework and verification costs;
Investment in training and change management; and
Governance that prevents shadow AI and protects sensitive data.

There is real productivity to be captured — often in the same places: routine communications, document summarization, and structured triage — but capture depends on how organisations integrate AI, not merely that they provide model access.

Conclusion: where to place your bets

Use Projects now for small, high‑value pilots where per‑project context matters: client engagements, recurring proposal templates, or controlled internal playbooks. The feature lowers the barrier to experimentation for free users and gives teams a safer way to centralise context.
Instrument every pilot. Measure the full cost of outputs, including review time and follow‑on work, not just first‑draft speed. Treat vendor claims as hypotheses you must validate.
Train and govern. Prioritise prompt‑crafting, verification practice, and a clear policy on what may and may not be uploaded to public models. This is how you turn the promise of generative AI from a novelty into repeatable, auditable advantage.

The shift from novelty to operational toolset is underway. Products like ChatGPT Projects make collaboration easier; the hard work now is organisational — designing workflows, measurement, and culture so that AI amplifies real work instead of creating more of it. The research and trials published over the past year show both the upside and the cautionary tail; the organisations that win will be the ones that treat AI as a change‑management problem first and a product installation second.

Source: Accounting Today Missing productivity gains from AI, and other tech stories you may have missed

Search

Navigation section

ChatGPT Projects: Realistic AI Workspaces and Productivity Risks

Background: what changed with ChatGPT Projects

The AI productivity paradox: what recent evidence is telling us

A surge of adoption, but uneven measurable value

Workslop: a new term for an old problem, amplified

Why that paradox happens: five structural causes

What Projects changes — and what it won’t

Practical strengths of ChatGPT Projects

Limits and what Projects does not solve

Practical guidance for IT leaders, managers and accounting firms

Redefine the ROI metric: move from seat counts to net throughput

Design pilots around tasks that match AI strengths

Create practical governance that preserves experimentation

Upskill neutrally and measure skill retention

Risk checklist: what to audit before expanding usage

The strategic trade‑off for CIOs and practice leaders

Conclusion: where to place your bets

Similar threads

Navigation section

ChatGPT Projects: Realistic AI Workspaces and Productivity Risks

The AI productivity paradox: what recent evidence is telling us​

A surge of adoption, but uneven measurable value​

Workslop: a new term for an old problem, amplified​

Why that paradox happens: five structural causes​

What Projects changes — and what it won’t​

Practical strengths of ChatGPT Projects​

Limits and what Projects does not solve​

Practical guidance for IT leaders, managers and accounting firms​

Redefine the ROI metric: move from seat counts to net throughput​

Design pilots around tasks that match AI strengths​

Create practical governance that preserves experimentation​

Upskill neutrally and measure skill retention​

Risk checklist: what to audit before expanding usage​

The strategic trade‑off for CIOs and practice leaders​

Conclusion: where to place your bets​

Similar threads

The AI productivity paradox: what recent evidence is telling us

A surge of adoption, but uneven measurable value

Workslop: a new term for an old problem, amplified

Why that paradox happens: five structural causes

What Projects changes — and what it won’t

Practical strengths of ChatGPT Projects

Limits and what Projects does not solve

Practical guidance for IT leaders, managers and accounting firms

Redefine the ROI metric: move from seat counts to net throughput

Design pilots around tasks that match AI strengths

Create practical governance that preserves experimentation

Upskill neutrally and measure skill retention

Risk checklist: what to audit before expanding usage

The strategic trade‑off for CIOs and practice leaders

Conclusion: where to place your bets