NYC MyCity AI Failure: Public Sector Bot Sparks Governance and Budget Debate

  • Thread Author
New York’s new mayor has moved quickly to pull the plug on a high-profile municipal AI experiment after months of reporting that the tool was confidently dispensing legally dubious and plainly incorrect guidance to small business owners.

Overview​

The MyCity business chatbot — launched as part of the Adams administration’s MyCity digital overhaul — was intended to give entrepreneurs fast, searchable access to city rules, permits and program details. Instead, investigative testing found the bot repeatedly returned answers that contradicted New York City law and worker protections, from telling businesses they could refuse cash payments to advising landlords they could reject tenants who use housing vouchers. Those failures prompted Mayor Zohran Mamdani to label the system “functionally unusable” and announce plans to remove it as a budget cut while the city works through a $12 billion deficit.
This article examines how the MyCity chatbot was built and governed, why it failed in practice, what the new administration’s decision means for public-sector AI, and the steps cities should take before putting automated advisors in front of residents and businesses. I cross-reference investigative reporting, municipal audit findings, and contemporaneous statements from city officials and contractors to provide a verifiable, critical account of what went wrong and what comes next.

Background: MyCity, the chatbot, and the pitch​

The MyCity initiative​

MyCity was billed as an ambitious consolidation of digital services — a single portal intended to make it easier for New Yorkers to find information across dozens of agencies. The chatbot was one visible component of that push: a conversational front end that promised to synthesize content from thousands of city business pages into plain-language answers for busy small-business owners. The project footprint included a reported initial investment for the platform's foundation and contractor-led development.

What the bot was supposed to do​

  • Provide immediate, understandable answers on permitting, licensing, labor rules, consumer protections and incentives.
  • Reduce phone volume to agencies and speed access to “official” city information.
  • Offer multilingual support and 24/7 accessibility to lower barriers for small operators and new entrepreneurs.
Promoted as a pilot, the MyCity Chatbot was explicitly positioned as a productivity and access tool — not a replacement for legal counsel. In practice, those lines blurred when the system began to produce confident-sounding but incorrect answers.

The failure modes: what the bot actually did​

Examples of dangerous errors​

Independent testing in 2024 by The Markup in partnership with THE CITY produced repeated, reproducible examples of the bot giving advice that, if followed, would have exposed users to legal or regulatory risk. Notable failures included:
  • Claiming businesses could operate cashless despite a 2020 city law requiring many merchants to accept cash.
  • Advising that businesses could take a cut of workers’ tips in contexts where that would be unlawful or expose the business to wage-and-hour claims.
  • Telling landlords they were not required to accept Section 8 or other housing vouchers, contrary to fair housing rules that protect source-of-income recipients in many New York City contexts.
  • Misstating the city minimum wage and failing to reflect statutory updates.
These weren’t one-off textual glitches: reporters and legal advocates were able to replicate similar bad outputs across multiple prompts and languages, indicating systemic problems with grounding and retrieval.

Why the errors matter​

A chatbot presented on an official city site carries implicit authority. Small-business owners operate on thin margins and frequently rely on fast guidance to make day-to-day operational decisions. When a government-branded assistant confidently tells a user that a city requirement doesn’t apply or that a risky action is allowed, that advice can translate directly into legal exposure, fines, or harm to workers and consumers. The city’s own disclaimer — telling users the bot’s responses might be inaccurate and not to use them as legal advice — did not mitigate the risk in practice, because the conversational answers read like authoritative instructions.

The technical and governance roots of the problem​

Architecture and vendor choices​

The bot was built on Microsoft’s cloud AI infrastructure and integrated with the MyCity data corpus that purported to draw on thousands of agency pages. The initial platform foundation reportedly cost in the neighborhood of $600,000, with ongoing maintenance and contractor costs adding to the total. Critics pointed to heavy reliance on outside contractors and a complex software stack as contributors to fragile, difficult-to-audit behavior.

Common engineering failure modes at play​

  • Retrieval-grounding gaps: The bot used retrieval-augmented generation (RAG) to surface municipal content, but retrieval errors or stale index snapshots made the model cite outdated or irrelevant documents. When the generative layer fills those gaps, it can produce plausible-but-wrong composite answers.
  • Incentives for fluency over caution: Large language models are optimized to produce helpful, fluent text; absent strong refusal and verification policies, they tend to answer rather than defer — even when evidence is weak. That creates confident-sounding hallucinations.
  • Weak contract oversight and product management: Municipal audits later described MyCity as suffering from project-management and contract oversight failures, problems that make iterative improvement slower and riskier in public deployments.

The governance gap​

City officials initially described the chatbot as a pilot with iterative fixes forthcoming. But the launch of any public-facing service without clear human-in-the-loop escalation, provenance display, or automatic refusal on high-risk queries allowed the bot to become a front-line source of misinformation. The absence of a robust verification and oversight regime — independent red-teaming, continuous audit logs, or a legal vetting pipeline — turned a prototype into a liability.

The mayoral intervention: politics meets procurement​

Mamdani’s decision and the financial context​

At a press conference focused on the city’s budget shortfall, Mayor Mamdani identified the chatbot as an example of a failed, costly tech bet from the previous administration and announced plans to remove the tool — part of a search for savings while tax and revenue proposals are considered. He described the program as “functionally unusable” and cited the roughly half‑million-dollar figure as the cost currently being carried by the city. A city spokesperson later confirmed plans to take the chatbot down.

How much did it cost?​

Published reporting and city documents indicate the foundational build cost approached $600,000, with maintenance and ongoing contractor expenses beyond that. Exact lifetime-to-date expenditures and annual operating costs vary by source and depend on which contracts and internal staff time are included, but the ballpark figures reported by auditors and journalists are consistent: significant public dollars were invested in a tool that produced demonstrable harms.

Political framing and accountability​

The decision to remove the service is both fiscal and political. It underscores two realities for municipal tech: (1) procurement and contractor management are as important as the AI model itself; and (2) when a public initiative misfires, the political cost is immediate and visible. The move also signals a demand for better pre-deployment checks on safety, legal compliance, and quality — particularly for tools that give advice touching on regulated activities.

A wider lesson: public-sector AI can amplify risk​

Why government deployments are high-stakes​

Public agencies frequently handle statutory rights and obligations. When an AI system is used to summarize or explain legal rules, errors are not merely embarrassing — they can be actionable and harmful. The MyCity case is a practical demonstration of the “authority effect”: citizens treat government-endorsed tools as trustworthy, which raises the bar for accuracy and provenance.

Systemic evidence from audits and red-team research​

Independent newsroom audits and institutional reviews repeatedly show that conversational AI systems make measurable errors and often fail to provide auditable provenance. Larger audits — including broadcaster-led studies and medical case reports documented in recent years — indicate the problem is not unique to New York’s chatbot but a feature of current-generation, retrieval-conditioned LLM deployments. That body of evidence argues for conservative, human-supervised rollouts where legal or safety exposure is possible.

What responsible public AI deployment would have looked like​

Minimal technical and governance checklist​

  • Explicit scope limitation: restrict the chatbot to low-risk informational queries and prevent it from answering questions with legal, medical, or safety consequences without human review.
  • Provenance-first design: every answer should link to the exact city document, code section, or time-stamped resource used to generate the reply, and those links must be verifiable by users.
  • Human-in-the-loop workflows: flagged queries (e.g., legal or regulatory interpretation) must be routed to trained staff for final response or must include clear, mandatory referral language to official counsel.
  • Continuous red-teaming and independent audits: third-party adversarial testing before public launch and scheduled re-tests following any platform update.
  • Transparent procurement and contract terms: public disclosure of vendors, cost breakdowns, SLAs for accuracy and remediation, and retention of data for auditing.

Governance and legal guardrails​

  • Establish a risk-tiering framework for all Q&A domains.
  • Require explicit legal sign-off for answers that interpret statutes or prescribe actions.
  • Maintain public change logs for training data and retrieval updates.
  • Create an accessible error‑reporting mechanism so users and advocates can flag dangerous outputs.
These are not theoretical desiderata; they are practical mitigations that reduce the probability of harm when deploying AI in civic contexts.

Response from stakeholders​

Adams administration​

Following the original reporting, the Adams administration defended the bot as a pilot and promised fixes, arguing that iterative improvement is normal for public tech innovation. City officials added disclaimers and limited the bot’s answer scope, steps that partially reduced the surface area of risk but did not address the underlying grounding errors.

Mamdani administration​

Mayor Mamdani framed the decision to remove the bot as fiscal prudence and a reassertion of oversight: when a public tool fails to meet basic standards of utility and safety, it is reasonable to curtail spending and re-evaluate deployment strategy. That posture is resonant with audit findings that described MyCity as poorly managed from both project and contract oversight perspectives.

Community advocates and legal experts​

Tenant advocates, worker-rights attorneys, and business associations warned that the chatbot’s inaccurate advice could cause real harm. Legal-services organizations specifically urged the city to withdraw the bot or constrain its remit until it could be independently validated, given the potential to mislead vulnerable users.

The broader consequences for municipal digital services​

Short-term: triage and rollback​

The immediate implication is rollback, decommissioning, or reconfiguration of the MyCity chatbot while the city decides whether to rebuild with stronger controls or simply retire the experiment. Decommissioning reduces exposure quickly, but it also leaves open questions about sunk costs, internal capacity-building, and whether the city can credibly promise a safer relaunch later.

Medium-term: procurement, staffing, and capability​

Cities that outsource AI development without building internal product and verification capacity will continue to face similar failures. Municipalities must invest in three things to avoid repeat scenarios:
  • In-house product and governance teams capable of interrogating vendor outputs.
  • Clear procurement terms tying payment to verifiable safety metrics.
  • Ongoing independent oversight budgets for auditing and red-teaming.

Long-term: trust and digital public goods​

Public trust is fragile. A high-profile failure that undermines confidence in municipal digital tools can slow adoption of genuinely useful services. Conversely, properly governed deployments that emphasize transparency and auditability can become durable digital public goods that reduce friction for citizens and businesses. The MyCity case therefore matters beyond New York: it is a cautionary precedent for any government considering public-facing AI.

Practical guidance for other cities and enterprise IT teams​

  • Treat municipal AI as a legal instrument: any system that touches compliance, housing, labor, or licensing must be treated with the same legal rigor as formal guidance documents.
  • Start small and verifiable: pilot narrow, low-risk domains where authoritative documents and update cycles are stable.
  • Publish full provenance: require that every generative answer include retrievable citations to exact source text and a date stamp for when the corpus was last updated.
  • Invest in human workflow integration: ensure a fast, auditable path from chat interaction to human review for any ambiguous or consequential responses.
  • Fund independent testing: budget for third-party red teams and make their reports publicly available.
These steps reduce legal exposure and foster public confidence; they also make it easier to quantify whether the tool is delivering real utility versus creating liability.

Where claims remain uncertain (and what to watch next)​

Some financial figures and precise contracting details remain opaque in public reporting. While multiple outlets and the comptroller’s audit point to several hundred thousand dollars in initial build costs and significant contractor dependence, the exact total cost of ownership over the bot’s lifecycle — including staff time, hosting charges, and contractor retainers — has not been consolidated into a single public ledger. Readers should treat single-point cost figures as indicative rather than exhaustive.
Key items to watch in the coming weeks and months:
  • Whether the Mamdani administration publishes a detailed cost breakdown and procurement audit for the MyCity program.
  • If the city will reopen procurement with explicit safety, provenance and auditability requirements, or whether the bot is permanently retired.
  • Any follow-up legal complaints or policy hearings that use MyCity as a test case for municipal AI governance.
When publicly funded tech goes wrong, transparency about costs and decisions is an essential step toward accountability and repair.

Conclusion​

New York’s MyCity chatbot offers a sharp lesson in the limits of deploying large language models without hardened governance, verifiable provenance, and clear human oversight. The system’s repeated, legally problematic answers transformed a pilot project into a public-relations and liability issue — and that ultimately became a line-item in a budget debate. Mayor Mamdani’s decision to take the bot down is a corrective step, but it’s only a first one.
Public-sector AI can deliver enormous benefits if built with discipline: narrow scopes, auditable evidence chains, human review where stakes are high, and procurement regimes that demand independent testing. The MyCity episode should not be an argument against using AI in government; it should be a prompt to insist that governments treat AI like any other public instrument that affects rights and responsibilities — with the corresponding standards of proof, oversight, and transparency.
By documenting what failed and why, municipal leaders and technologists can design safer, more accountable systems — and avoid turning a tool meant to expand access into one that erodes public trust.

Source: TechRadar Mayor Mamdani to axe NYC chatbot for giving false and dangerous responses