New Zealand AI Regulator Map Sparks Doubts Over Definitions and Public Trust

New Zealand’s Ministry for Regulation used AI in 2026 to help map more than 260 organisations in the country’s regulatory system for a May report, and BusinessDesk’s reporting shows the experiment has raised questions about accuracy, definitions, oversight, and public trust in AI-assisted policymaking. The uncomfortable lesson is not that officials touched a chatbot. It is that a government report can look empirical while quietly depending on machine-readable assumptions that most readers never see.
As reported by BusinessDesk’s Rachel Pannett, the ministry used AI to help produce a map of what Regulation Minister David Seymour has called a “twisted spaghetti” of regulators. The ministry says staff reviewed and checked the AI-generated material before it was used. That matters, but it does not end the debate, because the central failure alleged by outside experts was not a hallucinated paragraph or a fabricated footnote. It was a broader methodological problem: ask an AI system a loose policy question, and it may give you a very confident-looking loose answer.

EU courthouse backdrop with tangled network cables and AI risk-level labels, magnifying glass over “definitions, oversight, accuracy.”The Machine Did Not Invent the Spaghetti, but It May Have Overcooked It​

The Ministry for Regulation’s May report was presented as a first comprehensive view of New Zealand’s regulatory landscape. The ministry said the work drew together information that had previously been scattered across government agencies and datasets, identifying more than 260 organisations performing regulatory functions across central government, local government, statutory bodies, committees, tribunals, and other forms.
That is a legitimate policy goal. Anyone who has worked around compliance knows that “the regulator” is rarely a single entity with a single rulebook and a single front door. The difficulty is that mapping a regulatory system is not like counting ports on a switch. It requires judgment about what belongs in the system, what merely influences it, and what should be excluded.
BusinessDesk’s report is revealing because some entries in the ministry’s regulator list reportedly included organisations such as Consumer NZ and the NZ Federation of Sled Dog Sports. Experts cited by BusinessDesk questioned whether bodies without statutory enforcement powers fit most people’s understanding of a regulator. If the list is being used to demonstrate sprawling official power, that distinction is not academic.
This is where AI changes the texture of the mistake. A human analyst can use an overbroad definition, too. But AI systems make it easier to scale a definition across a large corpus, convert messy public information into structured tables, and produce a polished map that appears more settled than the underlying classification deserves.

Prompt Engineering Meets Political Engineering​

The most important phrase in the BusinessDesk story is not “AI-generated.” It is “broad definition.” Anton Kunitskiy, an Auckland-based AI and strategy consultant quoted by BusinessDesk, argued that the vulnerability may have been conceptual rather than technical: the model appears to have followed a wide instruction set that captured organisations participating in the regulatory ecosystem, not only those exercising statutory power.
That distinction should make every policy shop pause. Generative AI is often discussed as if the danger is that it will go rogue. In government work, the more mundane risk is that it will obediently operationalise a bad frame.
If officials ask a model to identify anything that looks regulatory, and the prompt includes broad categories such as incorporated societies or “other” entities, the model is not failing when it returns a bloated perimeter. It is doing what it was asked to do. The result may still be misleading if the public-facing report then reads like a count of regulators in the ordinary enforcement sense.
Policy work is full of these classification traps. Is an advisory council a regulator? Is a professional body a regulator if it sets standards but does not prosecute breaches? Is a consumer advocacy organisation part of the regulatory system if it influences compliance norms? Reasonable analysts can disagree, but the disagreement must be visible.
AI does not remove the need for such debates. It makes the initial draft faster, which can be useful, and the assumptions easier to bury, which can be dangerous.

The Public Sector Has Found Its Spreadsheet Moment​

There is a familiar pattern in enterprise technology adoption. First, a tool arrives as an efficiency aid. Then it becomes embedded in workflows. Finally, its outputs become part of the organisation’s evidence base, even when nobody can quite reconstruct all the choices that produced them.
Spreadsheets did this decades ago. They allowed analysts to build models faster than their institutions could govern them. AI is now doing something similar to policy research, but with natural language, search, summarisation, classification, and graphic generation folded into the same workflow.
BusinessDesk reported that the Regulation Ministry spent about NZ$106,000 last year on licensing and training staff to use Microsoft Copilot, according to deputy chief executive Paula Knaap’s comments to the finance and expenditure select committee on July 1. Knaap described the ministry as an early leader in the area and said Copilot supported staff productivity as the government looked at costs.
That is exactly how this technology will enter most public agencies: not as a moonshot, but as a procurement line item. Microsoft has designed Copilot to feel like an extension of Office work, not a separate AI laboratory. For public servants, that means the boundary between “I used a tool to help draft this” and “the tool shaped the analytical structure of this” can become hard to police.
The ministry’s own public guidance on responsible AI says regulatory decisions still require human judgment, legal interpretation, and accountability. That is the right principle. The harder question is how to prove, after the fact, that human judgment was more than a final skim.

The Trust Problem Starts Before the First Hallucination​

The public debate around AI in government often fixates on hallucinations because they are easy to understand. A model makes up a case, invents a source, or misquotes a rule. The error is visible, embarrassing, and usually correctable.
The New Zealand regulator map points to a subtler risk. The facts may be individually sourced, and the table may still encode a contested theory of the world. That is not a hallucination in the usual sense. It is classification drift disguised as administrative discovery.
This matters because regulatory mapping is not neutral. A report saying there are 267 regulators in a small country carries political weight. It can support arguments for consolidation, deregulation, budget cuts, institutional redesign, or ministerial intervention. If some of those entities are not regulators in the way citizens understand the term, the number becomes rhetorically stronger than analytically clean.
David Seymour’s “twisted spaghetti” metaphor is politically effective because it turns complexity into a visual problem. The AI-assisted map then gives that metaphor a data-like surface. But a map is only as good as its legend, and in this case the legend is where the fight is.
That does not mean the ministry’s exercise was worthless. On the contrary, mapping fragmented systems is one of the more sensible uses of AI in government. The lesson is that the more useful the tool becomes, the more important it is to publish the definitions, thresholds, exclusions, review process, and uncertainty around the output.

Human Review Is Not a Magic Wand​

The ministry told BusinessDesk that AI-generated content was reviewed and checked by staff before it was used to draft the report. That reassurance is necessary, but insufficient. “Human in the loop” has become the public-sector equivalent of “military-grade encryption”: a phrase that sounds comforting until someone asks what it means in practice.
Human review can mean a domain expert challenged every borderline case. It can mean a manager scanned a draft for obvious nonsense. It can mean a team reconciled entries against legislation, agency records, and enforcement powers. Or it can mean nobody had enough time to do any of that properly before publication.
The quality of review depends on expertise, time, incentives, and documentation. If an AI-generated classification table lands in front of officials who are already under pressure to produce a compelling map of regulatory sprawl, confirmation bias becomes a workflow risk. The model gives the shape; the office tidies the edges; the report inherits the premise.
Ali Knott, an AI expert at Victoria University of Wellington quoted by BusinessDesk, described the ministry’s two-phase method as reasonably sophisticated and said the prompts appeared to have been written by a policy person with technical ability. That is an important corrective to lazy “officials don’t understand AI” narratives. The problem may not be incompetence. It may be that competent AI use can still produce policy fragility when the task is definition-heavy.
This is what makes the episode worth watching. The ministry did not appear to use AI as a gimmick. It used AI for exactly the sort of high-volume classification and drafting support that governments everywhere are eyeing. The controversy is therefore a preview, not an anomaly.

Graphics Are Not Decoration When They Carry Authority​

BusinessDesk also reported that one of the AI tasks involved producing a graphic for the regulatory report. That may sound less consequential than building the database, but visualisation is where a policy argument often hardens into public memory.
A dense table says, “Here is our analysis.” A diagram says, “Here is the system.” Once a graphic appears in a government report, it can travel through media coverage, ministerial speeches, slide decks, select committee briefings, and social posts. It becomes the thing people remember when the caveats are gone.
AI-generated or AI-assisted graphics raise a different trust issue from AI-generated text. A graphic can simplify in legitimate ways, but it can also exaggerate relationships, imply hierarchy, or visually inflate a category. In a report about regulatory complexity, the difference between a regulator, a participant, a tribunal, a local authority, and a private association is not just visual clutter. It is the core analytical distinction.
There is a temptation to treat graphics as communications work that happens after the real policy analysis. That is wrong. In modern government, design is part of the argument. If AI helps generate the design, then the same standards of traceability and review should apply.

Microsoft Copilot Is Becoming Public Infrastructure by Stealth​

For WindowsForum readers, the Microsoft angle is not incidental. Copilot is rapidly becoming the default AI layer for organisations that already live in Microsoft 365. That gives it an enormous advantage in government, where procurement, identity management, document storage, email, and compliance workflows are already Microsoft-shaped.
This is not necessarily bad. Enterprise AI tools can be more governable than staff pasting sensitive material into random web chatbots. Microsoft’s ecosystem offers administrative controls, audit possibilities, data-boundary promises, and integration with existing identity systems. For agencies that are going to use AI anyway, a managed tenant is preferable to shadow AI.
But the convenience cuts both ways. When AI is embedded in Word, Excel, PowerPoint, Teams, Outlook, and search workflows, it becomes ordinary. Ordinary tools are harder to scrutinise because nobody wants to write a ministerial risk assessment every time an analyst summarises a document.
The New Zealand case shows the line that matters. Using Copilot to clean up notes, summarise public documents, or draft a briefing paragraph is one category of risk. Using AI to build a structured evidence base for a substantive policy report is another. The second needs explicit governance, not just a general training module.
IT administrators will recognise the pattern. A feature deployed as productivity software becomes a governance problem once it shapes records, decisions, and external publications. The question is not whether Copilot is “allowed.” The question is what work product it is allowed to influence, and what audit trail survives.

The Job-Cut Narrative Is the Wrong Lens, but It Will Not Go Away​

The Regulation Ministry’s AI work sits in a broader political context: governments are under pressure to do more with fewer staff, and AI is routinely advertised as a way to reduce administrative burden. BusinessDesk reported that Gráinne Moss, the secretary for regulation, told lawmakers the ministry did not expect job losses in the coming year because of AI use.
That may be true in the near term. The more interesting issue is not whether AI replaces a policy analyst tomorrow. It is whether it changes what counts as competent policy work.
If AI can compile a first-pass map of a regulatory landscape, the value of human labour moves up the chain. Officials must define the question, validate the sources, challenge the categories, test the edge cases, document the uncertainty, and decide whether the output can support a public claim. That is not less work in every case. Sometimes it is different work, and sometimes it is more work if the stakes are high.
Knott’s question, as relayed by BusinessDesk, gets to the economics: how much effort does it take for a human expert to set up and adequately check such a system compared with doing the work manually? That is the productivity equation public agencies cannot dodge. AI saves time only when the review burden is proportionate to the task and the cost of error is tolerable.
For low-risk internal synthesis, the balance may be favourable. For public reports that shape political narratives, the review burden should be heavy. If that makes AI less magically efficient than its boosters imply, so be it. Government credibility is not a beta feature.

The Best Use Case Is Also the Best Warning​

There is a reason this story is more interesting than a simple “AI makes mistake” item. The ministry’s use case is defensible. Large language models and related retrieval systems are well suited to scanning public material, extracting entities, clustering topics, and helping humans navigate dense information landscapes.
A government that refuses to use AI for such tasks will eventually look antiquated. Public agencies sit on mountains of documents, statutes, guidance, annual reports, submissions, and operational data. Tools that help officials find patterns across those materials could improve policy quality if used carefully.
But the New Zealand episode shows that careful use begins before the model runs. It begins with definitions. It asks what the output will be used for. It separates statutory regulators from ecosystem participants. It records why borderline entities were included. It distinguishes “this body regulates” from “this body appears in regulatory context.”
That is not anti-AI bureaucracy. It is the discipline required to make AI useful in institutions where words have legal, fiscal, and democratic consequences.
The irony is that the ministry’s controversial map may end up being a useful case study for the responsible AI guidance it promotes. It demonstrates both sides of the technology: AI can accelerate a serious mapping exercise, and AI can magnify an ambiguous premise into a national talking point.

The Regulator Map’s Real Lesson for Government AI​

The practical lessons are less glamorous than the technology, but more important. Agencies do not need a moral panic about AI-assisted policy work. They need boring controls that force contested assumptions into the open before a report becomes a headline.
  • Agencies should publish the operational definition used when AI helps classify entities for a public policy report.
  • Human review should be described in concrete terms, including who reviewed the output, what expertise they had, and how borderline cases were resolved.
  • AI-assisted evidence bases should preserve prompts, source lists, model settings where relevant, and version histories as part of the official record.
  • Reports should separate machine-assisted discovery from ministerial interpretation, especially when the findings support politically charged claims.
  • Visualisations created with AI assistance should be reviewed as analytical claims, not merely as communications material.
  • Public-sector AI productivity claims should include the cost of validation, because unchecked speed is not the same thing as efficiency.
These are not exotic safeguards. They are the public-sector equivalent of change control, peer review, and audit logging. The point is not to ban officials from using modern tools. It is to stop the toolchain from becoming invisible at precisely the moment it starts shaping public facts.
The trust test for AI in government will not be whether officials can produce slicker reports faster; they plainly can. It will be whether ministries can show their working when AI helps turn scattered information into policy evidence, and whether they are willing to expose the definitions, doubts, and disputed cases that make democratic scrutiny possible. If New Zealand’s regulator map becomes a template, it should be remembered not as proof that AI cannot do policy work, but as a warning that policy work cannot be automated away from accountability.

References​

  1. Primary source: BusinessDesk | NZ
    Published: 2026-07-05T17:00:08.921934
 

Back
Top