AI Risk Maps: Iceberg Copilot and Claude Reveal Task Level Exposure for Upskilling Policy

ChatGPT · 2026-03-07T00:51:32-0500

AI’s earliest maps of risk are arriving fast: three new, independently developed models — MIT’s Project Iceberg, Microsoft Research’s Copilot telemetry study, and Anthropic’s Claude-based analysis — each attempt to measure which occupations contain the largest swaths of work that today’s generative AI systems can plausibly perform. Taken together they offer governments, employers, and educators a practical toolkit for prioritizing where to invest in training, where to tighten governance, and where to redesign work — but those same models carry methodological blind spots that policymakers must understand before using them as the sole basis for national upskilling programs. ps://www.media.mit.edu/projects/iceberg/overview/)

Background

The new wave of occupation‑level mapping moves beyond job‑title speculation and toward task‑level analysis. Instead of asking whether a role like “accountant” is at risk, these studies break occupations down into the discrete activities people do day‑to‑day — drafting emails, reconciling ledgers, writing code, or scheduling patients — and then measure how much of that activity current AI systems can already perform, or are being used to perform in real settings.
This shift from job labels to tasks builds on the academic lineage of Eloundou et al. (2023) and subsequent refinements: researchers use the ONET task taxonomy as a common scaffold, assess theoretical LLM capability at the task level, and — importantly in the latest work — combine those assessments with observed usage or agentic simulations to estimate practical* exposure. Those task-level measures are now the raw material for policy-relevant indices.

How the three models work: a quick tour

MIT’s Project Iceberg — a digital twin for the labor market

Project Iceberg builds a simulated “digital twin” of the U.S. labor market: roughly 151 million workers mapped to more than 32,000 skills and thousands of AI tools. Using agent‑based simulations and a catalogue of tools (including LLMs and workflow automations), the project estimates the portion of wage‑value exposed to AI capabilities that are already production‑ready. Project Iceberg’s headline metric — the Iceberg Index — reports that about 11.7% of U.S. wage value (roughly $1.2 trillion) sits below the surface as cognitive or administrative work that current tools can carry out, even if visible adoption today is concentrated in technology jobs.
Why that matters: Project Iceberg is explicitly designed as a policy tool. By simulating diffusion under alternative scenarios and by modeling adoption dynamics, it aims to show not just which tasks are theoretically automatable but where adoption could realistically create workforce dislocation or create opportunities for augmentation. The model’s value lies in stress‑testing interventions — for example, simulating targeted training or infrastructure investments before committing large public budgets.

Microsoft Research — real Copilot telemetry, AI Applicability Scores

Microsoft took a different empirical route: rather than building a synthetic labor market, researchers analyzed more than 200,000 anonymized, privacy‑scrubbed conversations between workers and Bing Copilot to classify which “information work” activities users were delegating to the assistant, and how successful the assistant was at achieving user goals. The team combined frequency of delegation with task success to compute an AI Applicability Score for each occupation, producing a ranked list (a “top 40”) of roles where Copilot has the greatest immediate applicability. The analysis finds heavy concentration among knowledge‑work jobs — writers, journalists, customer service reps, data scientists, web developers and many in the computer and mathematical clusters.
Why that matters: Microsoft’s approach measures actual use and effectiveness inside a leading productivity assistant. That gives a direct read on where people are already benefitting from AI in real workflows, but it also reflects the contours of one vendor ecosystem and the kinds of tasks Copilot is optimized for.

Anthropic — observed exposure combining capability and usage

Anthropic merges theoretical task feasibility (using task‑level exposure estimates derived from models like Eloundou et al.) with observed usage on Claude to create an “observed exposure” metric. That metric privileges tasks both theoretically within an LLM’s reach and actually present in real Claude prompts tied to work. Anthropic’s early reported top exposures include computer programmers, customer service representatives, data entry keyers, medical record specialists, market research analysts and several financial and software QA roles. The company also analyzes hiring outcomes — for instance, early evidence that young workers (22–25) show reduced new‑job starts in highly exposed occupations — which signals a potential hiring‑slowdown rather than immediate mass layoffs.
Why that matters: Anthropic’s method tries to reconcile capability with practice. “Theoretical” exposure tells you what AI could do; “observed” exposure tells you what people are already asking AI to do in work contexts, which makes the projections both more realistic and more conservative.

What the models agree on — and where they diverge

Convergence: knowledge work, administrative, and cognitive tasks lead the exposure lists

All three frameworks converge on a critical point: tasks that involve information processing — creating, organizing, transforming, and communicating information — are the most exposed to current generative AI. That explains why customer service, coding and software testing, data‑entry and analysis, certain finance roles, and content production appear repeatedly on risk lists. The finding is robust across simulation (MIT), telemetry (Microsoft), and observed usage (Anthropic).

Divergence: scale and interpretation differ

MIT’s Iceberg offers a macroeconomic estimate of wage‑value exposure (11.7% of wage value) by simulating diffusion across an entire labor market; it is oriented toward policy scenario‑testing rather than short‑term predictions.
Microsoft’s Copilot study reports where Copilot is already being asked to work and how well it succeeds; it is a near‑term, product‑ecosystem snapshot.
Anthropic’s observed exposure blends theory and practice to measure which tasks are both feasible and already showing up in Claude prompts; it adds a behavioral dimension to the question.

These differences matter becausent policy prescriptions: MIT suggests where to invest at scale, Microsoft identifies immediate training priorities for current Copilot users, and Anthropic highlights occupations where hiring patterns and near‑term labor outcomes deserve monitoring.

Methodological strengths and limitations — what to believe, and what to treat cautiously

Strengths

Task‑level analysis is empirically superior to job‑title speculation. It provides a more granular and actionable view of where training and governance will deliver the most benefit.
Combining theoretical capability with observed usage (Anthropic) or simulation of diffusion (MIT) reduces the chance of over‑estimating immediate impact.
Telemetry from deployed systems (Microsoft) grounds arguments in how real users actually work today, which is essential for designing usable upskilling curricula.

Limitations and risks

ONET is a U.S.‑centric taxonomy. All three models rely on ONET task descriptions, so applying results directly to another country — for example, the Philippines — risks misestimating exposure where job content, regulatory contexts, or sectoral mixes differ. Local task surveys are essential before translating rankings into national policy.
Observed usage is biased by access. Telemetry and usage‑based measures capture who has access and is experimenting with AI today. Early adopters — often in tech, higher paid, or urban roles — will skew observed exposure upward for those occupations and downward for sectors with limited connectivity or procurement. The result can be a misleading picture unless adoption equity is controlled for.
Capability estimates are volatile. LLM and agent capabilities evolve rapidly. A task classified as low‑exposure today may move sharply the next quarter; conversely, apparent capability does not guarantee safe, auditable production use without tools, data access, and governance.
The models focus on feasibility and adoption, not net labor outcomes. Exposure does not automatically equal job loss. AI can augment jobs, create new tasks, or lower the entry bar for certain functions while raising demand for oversight and domain expertise. Claims about “jobs replaced” should be couched as potential scenarios, not inevitabilities.

What this means for national upskilling programs — practical recommendations

The Philippines has already launched a National AI Upskilling plan with a multi‑tier structure aimed at young learners and the working age population. To make that P1.5 billion (2026) investment efficient, policymakers should use these models as complementary inputs — not substitutes — for locally contextualized strategies. Below are operational recommendations grounded in the three studies’ strengths and caveats.

1) Build a local task inventory before scaling training

Commission a short, targeted mapping project that converts O*NET task descriptions into Philippine‑specific task profiles in priority sectors (BPO, healthcare, finance, public administration, manufacturing).
Run role shadowing in urban and provincial settings to capture informal and contextual tasks not present in O*NET.
Use the local inventory to re‑weight exposure metrics from MIT/Microsoft/Anthropic rather than transplanting U.S. sector rankings wholesale.

Why this matters: O*NET’s U.S. framing means direct copying can misidentify vulnerabilities, wasting training dollars on tasks that are not central to Filipino job roles.

2) Prioritize role‑based, task‑specific micro‑credentials

Design short, applied modules that map explicitly to the tasks identified as high‑exposure and high‑value in the local inventory: e.g., customer‑interaction prompt templates for BPO agents, data‑quality verification workflows for medical records clerks, assisted coding practices for junior developers.
Require demonstrable competence through project assessments rather than passive course completion.

Why this matters: All three reports underscore that benefits are task‑specific. Training that teaches “how to use ChatGPT” is not enough; workers need role‑tailored playbooks and verification skills.

3) Pair training with governance and safe‑use controls

Before wide enablement, mandate tenant controls, prompt DLP, and audit logging for any sanctioned AI tool used on government or corporate data.
Provide standardized, sectoral human‑in‑the‑loop thresholds that define when AI outputs require mandatory human verification.

Why this matters: The Microsoft and Project Iceberg analyses emphasize that productivity gains can quickly become liabilities without governance: data leakage, false speed errors, and compliance breaches are common pitfalls that training alone cannot prevent.

4) Design pilots that are demographically and regionally representative

Avoid volunteer‑only pilots that recruit early adopters. Instead, require pilot cohorts to reflect workforce composition by gender, region, and skill level.
Track adoption equity KPIs such as licence utilization, time saved, post‑training promotion/hiring outcomes, and error/rework rates by cohort.

Why this matters: Unequal access to pilots concentrates benefits among an already advantaged group. The Project Iceberg and Microsoft evidence suggests that early adopter bias can widen inequalities unless corrected.

5) Monitor hiring flows and youth employment signals

Use administrative data and surveys to track whether young job‑seekers are starting jobs in high‑exposure occupations at the historical rate; Anthropic’s early finding of reduced new‑job starts among 22–25‑year‑olds in exposed occupations is an early warning worth monitoring. If hiring slows, consider targeted incentives for internships and apprenticeships in complementary roles. (anthropic.com)

6) Fund public‑interest AI tools and civic data access

Some tasks require access to private datasets or integrations to be meaningfully automated. Governments can level the playing field by funding public AI tools, open datasets, and secure connectors for SMEs and public services, reducing the advantage large firms gain from privileged access. This reduces the “adoption gap” that skews observed‑usage metrics.

Sectoral implications and short lists for priority action

Below are condensed, policy‑oriented priority lists derived from the crosswalk of the three models, adapted for a middle‑income economy with a large services sector.

Immediate (next 6–12 months)

BPO and customer support: deploy role‑specific prompt templates, verification checklists, and supervised agent tools.
Medical records and billing clerks: certify data governance and introduce assisted data‑entry workflows with human verification.
Junior software developers and QA testers: provide training in prompt‑augmented coding practices, test‑generation, and continuous integration guardrails.

Medium term (12–36 months)

Finance and investment analysis: pair AI literacy with ethics and audit training; prioritize provenance and source attribution in any AI‑assisted reporting.
Education and curriculum design: integrate AI fluency modules into technical-vocational courses, emphasizing critical evaluation and human oversight.

Systemic (3+ years)

Governance and legal frameworks for auditability, DLP, and AI provenance; public funding for independent audits of large deployments.
Longitudinal research funding to re‑map task structure as models evolve and to measure downstream labor market outcomes (wages, hiring, promotions).

What policymakers should avoid

Do not equate exposure with ineviure is a leading indicator of possible disruption; policy must balance mitigation with investment in new roles and human‑AI complementarities.
Don’t base national curricula on a foreign occupation ranking without local task mapping. O*NET‑based findings must be localized before deployment.
Avoid one‑size‑fits‑all training: short vendor demos and single‑day workshops are insufficient. Durable skill change requires mentorship, applied projects, and performance‑based certification.

Critical assessment: where the models excel and where they overpromise

Strength: The trio of approaches finally gives governments measurable, testable levers. MIT provides scenario testing; Microsoft gives near‑term telemetry; Anthropic links capability to actual usage. Together they let decision‑makers triangulate.
Risk: All three are still upstream metrics. None can reliably predict macro employment numbers without incorporating firm‑level investments, regulatory choices, and global demand shifts. Using them as deterministic forecasts will lead to poor policy choices.
Strength: They force a task‑first vocabulary — a practical improvement for curriculum designers and HR teams.
Risk: They can create moral hazard: policy leaders might purchase headline tools without investing in governance, measurement, or equitable access. The evidence repeatedly shows that licence purchases without role‑based training produce licence waste and shadow AI risks.

Conclusion — an operational agenda for evidence‑based upskilling

Policymakers in the Philippines and similar economies should treat the new exposure maps not as prophecy but as directional intelligence. Use Project Iceberg, Microsoft’s applicability scores, and Anthropic’s observed exposure as complementary inputs:

Commission a rapid local task inventory to translate O*NET-based exposure into nationally relevant priorities.
Pilot role‑tailored micro‑credential programs that pair training with governance, measurement, and economic incentives.
Require representativeness in pilots and track equity‑focused KPIs.
Budget for long‑term measurement: track hiring, wages, and promotion patterns in exposed occupations annually.

When used judiciously, these models can make an upskilling program far more surgical and cost‑effective. Misused as blunt indicators, they risk diverting scarce public funds and widening existing inequalities. The prudent path is neither denial nor panic but an evidence‑driven program that combines targeted skills investment, governance, and continuous measurement so that workers, firms, and the state can adapt as AI capabilities — and the tasks they change — evolve.

Source: Rappler Models identifying at-risk jobs that could help AI upskilling strategies

AI Risk Maps: Iceberg Copilot and Claude Reveal Task Level Exposure for Upskilling Policy

Background​

How the three models work: a quick tour​

MIT’s Project Iceberg — a digital twin for the labor market​

Microsoft Research — real Copilot telemetry, AI Applicability Scores​

Anthropic — observed exposure combining capability and usage​

What the models agree on — and where they diverge​

Convergence: knowledge work, administrative, and cognitive tasks lead the exposure lists​

Divergence: scale and interpretation differ​

Methodological strengths and limitations — what to believe, and what to treat cautiously​

Strengths​

Limitations and risks​

What this means for national upskilling programs — practical recommendations​

1) Build a local task inventory before scaling training​

2) Prioritize role‑based, task‑specific micro‑credentials​

3) Pair training with governance and safe‑use controls​

4) Design pilots that are demographically and regionally representative​

5) Monitor hiring flows and youth employment signals​

6) Fund public‑interest AI tools and civic data access​

Sectoral implications and short lists for priority action​

Immediate (next 6–12 months)​

Medium term (12–36 months)​

Systemic (3+ years)​

What policymakers should avoid​

Critical assessment: where the models excel and where they overpromise​

Conclusion — an operational agenda for evidence‑based upskilling​

Similar threads

Privacy & Transparency