State Treasurer Briner Pushes Responsible AI Adoption in Public Finance

  • Thread Author
State Treasurer Brad Briner’s office is moving from experimentation to an active posture on artificial intelligence, signaling a deliberate push to fold generative AI into everyday state operations while still testing guardrails and governance frameworks. The department’s recently completed 12‑week pilot used advanced ChatGPT tooling to accelerate tasks in the Unclaimed Property and State & Local Government Finance divisions, producing measurable time savings and prompting further pilots with Microsoft Copilot and other vendors as divisions assess role‑specific needs. The effort is pragmatic: small, instrumented pilots, competency‑gated access, and a stated intention to scale responsibly — but the path from promising pilot metrics to durable, safe production adoption will demand clearer procurement safeguards, stronger technical controls, and independent validation of productivity claims.

Two professionals review an AI dashboard with logs, audit trails, and regulatory docs.Background​

What was announced and why it matters​

Treasurer Briner announced a series of AI experiments designed to modernize workflows that process heavy volumes of public finance data and citizen‑facing records. The headline pilot ran for 12 weeks and partnered with a major model provider to explore use cases such as locating businesses with unclaimed property, summarizing regulations and audits, and surfacing inconsistencies in large financial data sets. Initial internal reporting points to an average productivity uplift that the department describes as meaningful — on the order of single‑digit to low‑double‑digit percentage gains — and concrete examples where multi‑hour tasks were reduced to minutes or seconds in pilot scenarios.
These pilots are notable because state treasurer offices are custodians of sensitive financial records, public pensions, and statutory obligations (including public‑records and audit requirements). Any material use of generative AI in such contexts raises governance, procurement, privacy, and FOIA considerations that must be resolved before tools move beyond a small cohort of trained users. The Treasurer’s approach—limited pilots, training prerequisites, and an expressed willingness to test multiple vendor solutions—reflects a responsible starting point, but operationalizing it at scale is a separate challenge.

What the pilot found: productivity, use cases, and limits​

Concrete productivity gains (pilot‑level)​

The pilot cohort reported notable time savings across tasks that were repetitive, research‑heavy, or required synthesizing many documents. Examples cited internally include:
  • Rapid summarization of multi‑page audit requests and regulatory texts, compressing review time dramatically;
  • Automated candidate identification for unclaimed property outreach by analyzing public datasets; and
  • Data‑consistency checks across large local government financial filings.
These are typical early high‑value wins for generative AI in government: work that is readable, repetitive, and benefits from pattern recognition. The Treasurer’s office reported average productivity improvements in pilot groups and highlighted instances where a 90‑minute review was shortened to a fraction of that time. These are encouraging pilot results, but they remain self‑reported and cohort‑limited, and therefore should be treated as directional rather than definitive proof of system‑wide benefit.

Where the technology struggled​

Pilot participants consistently reported a need to verify and refine AI outputs. Hallucinations, incomplete citations, and context‑sensitivity problems meant that outputs were useful as drafts or investigative aides but not as final, unreviewed deliverables. The Treasurer’s office treated outputs as assistive: human verification was mandatory, especially for any content that could affect legal outcomes, benefit calculations, or public disclosures. This human‑in‑the‑loop posture is essential to limit operational and legal exposure.

Landscape and context: vendor mix and Microsoft Copilot references​

The Treasurer’s office has run pilots with a major model provider and is also engaging Microsoft Copilot and other vendor offerings at division levels to compare fit, control, and governance capabilities. This reflects a broader public‑sector pattern: agencies run multiple pilots in parallel to compare desktop productivity assistants (Copilot), conversational RAG workflows (vendor chat models), and tenant‑bound services with stronger audit controls. Choosing between vendor models should be a decision based on data governance, tenancy model, auditability and contractual protections rather than on feature headlines alone.

Critical analysis: strengths, blind spots, and execution risk​

Strengths of the Treasurer’s approach​

  • Pilot-first, measurement-oriented: The office ran a bounded 12‑week pilot with clearly articulated use cases and produced a report rather than making blanket production commitments. This staged approach reduces rush‑to‑scale risk.
  • Training and competency gating: Requiring staff training before granting access is a practical governance lever that materially reduces misuse risk and ensures a common baseline of operator competence.
  • Multi‑vendor experimentation: Testing several vendor stacks allows the office to compare tenancy, non‑training/no‑train contractual language, and audit features. This reduces single‑vendor lock‑in risk if procurement follows accordingly.

Key vulnerabilities and unresolved questions​

  • Self‑reported metrics need independent verification: Pilot results, including productivity percentages and time‑saved anecdotes, are persuasive but remain self‑reported. Independent, instrumented measurement (baseline vs. post‑pilot workflows, error/rework rates, and citizen outcome metrics) is required to validate claims before committing to recurring licensing costs and headcount changes.
  • Procurement and contractual safeguards: Without explicit no‑train clauses, data‑use guarantees, and clear egress/portability language, the state risks vendor lock‑in or the inadvertent use of state data in vendor model training. These are material legal and policy risks that require careful negotiation.
  • Public‑records, FOIA and retention complexity: Prompts, outputs, and human edits may be subject to public‑records law. The state must define retention and retrieval procedures — including how to store and export prompt logs and AI outputs for FOIA compliance — before scaling. This is non‑trivial and often overlooked.

Technical controls and operational guardrails the Treasurer’s office should make mandatory​

Identity and tenancy hardening​

  • Enforce phishing‑resistant multi‑factor authentication and conditional access for any accounts authorized to use AI connectors. Bind access to role‑based policy and minimum‑privilege configurations.

Data classification and routing​

  • Implement label‑based routing so that PII, CUI, and other sensitive data are excluded from permissive AI flows. Use DLP and Microsoft Purview (or equivalent) to block or quarantine high‑risk prompts and exports.

Immutable logging, provenance and exportability​

  • Require immutable prompt/response logs with export capability and indexed retention so that investigators can recreate a timeline for decisions or disclosures. Make these logs an auditable deliverable in vendor contracts.

Human‑in‑the‑loop thresholds​

  • Define clear acceptance gates: which outputs can be published without review, which require a second human sign‑off, and which are forbidden for AI‑assisted publication. Implement automated enforcement where possible.

Incident response and red‑teaming​

  • Build an AI‑specific incident response playbook that includes hallucination remediation, data leakage investigation steps, and prompt provenance collection. Conduct regular red‑team exercises before any scaled deployment.

Procurement and contracting: must‑have clauses​

  • No‑train / Non‑derivative use clauses that prevent vendor model training on state data unless explicitly authorized.
  • Data egress and portability guarantees: describe formats, retention windows, and the export process for prompt logs and indexed outputs.
  • Model‑version pinning and rollback rights: vendor must notify upgrades and allow rollback to a pinned model in case of regressions.
  • Audit and attestation rights: contractual right to a third‑party audit, penetration test results, and SOC/ISO reports.
Negotiating these terms upfront protects the state from surprise operational exposure and creates a clear compliance posture that procurement, legal, and records teams can enforce.

Workforce strategy: training, labor engagement, and reskilling​

Training as a gate, not a panacea​

Mandatory training is a strong first step, but alone it doesn’t eliminate the need for process redesign. Training should be paired with:
  • role re‑scoping for reviewers and signatories,
  • revised SOPs that explicitly declare human responsibilities for AI‑assisted outputs,
  • and measured reskilling programs that track competence over time.

Labor inclusion and transparency​

Where unions or labor groups exist, include them in governance and deployment planning. Pennsylvania’s model of a Generative AI Labor and Management Collaboration Group demonstrates the benefits of including front‑line staff in adoption design; it reduces resistance and ensures deployment addresses real workflow needs rather than top‑down automation. The Treasurer’s office should emulate this collaboration to maintain trust and operational reliability.

A practical pilot playbook the Treasurer’s office should adopt before scale​

  • Executive alignment (Week 0–1): define one or two measurable business KPIs for the pilot (e.g., minutes saved per processed audit, percent reduction in backlog).
  • Foundations (Weeks 2–4): identity hardening (MFA/conditional access), data classification, and legal review of vendor clauses.
  • Controlled pilot (Weeks 5–10): instrumented trials with telemetry capture, immutable logging, and human‑in‑the‑loop enforcement on any output bound for public release.
  • Validation (Weeks 11–12): independent audit of sample outputs, rework and error measurement, and stakeholder review.
  • Decision gate (Week 13): approve scale, iterate, or retire based on pre‑agreed acceptance criteria and independent verification.
This 90‑day approach maps to the Treasurer’s 12‑week pilot rhythm and emphasizes measurement and governance at each decision gate.

Fiscal realities and FinOps considerations​

  • AI licensing is a recurring operational expense (Copilot and enterprise model seats, data egress, connector indexing). The Treasurer’s office must model three‑ to five‑year TCO, not just pilot costs. Budget must include licensing, increased identity and security tooling, telemetry and SIEM ingestion fees, and overhead for records retrieval and compliance.
  • Consumption models can be unpredictable. Require vendors to provide usage dashboards and FinOps guardrails (budget alerts, inference caps) to avoid billing surprises. If pilot‑to‑production is successful, adopt tiered provisioning and automated cost controls to limit runaway spend.

Public transparency and accountability​

Before any significant scale, publish a high‑level transparency report covering:
  • scope of pilot usage and participating divisions,
  • non‑sensitive example prompts and outputs used for training or testing (redacted for PII),
  • a summary of controls (DLP, logging, human review rules), and
  • planned metrics for independent validation.
Transparency builds public trust and preempts FOIA disputes by clarifying process and retention policies in advance. It also creates a durable record that the Treasurer’s office is acting in the public interest rather than behind closed doors.

Strengths and plausible next steps​

  • The Treasurer’s office has taken measured steps: bounded pilots, external evaluation partners, and a posture that treats AI as an augmentation rather than a replacement. These are practical foundations for broader adoption.
  • Next steps that add value without adding risk:
  • Instrumented, third‑party audits of pilot claims to move from self‑reported benefits to validated outcomes.
  • Publish a succinct AI governance charter that codifies human‑in‑the‑loop thresholds, logging retention policy, and procurement requirements.
  • Use a Center of Excellence (CoE) model to centralize best practices, manage exceptions, and drive reuse of verified prompt templates and data‑handling playbooks.

Risks that deserve urgent attention​

  • FOIA and public‑records exposure: Without auditable storage and retrieval of prompts and outputs, the state risks non‑compliance with public‑records law and litigation. Define retention and retrieval now.
  • Legal and procurement gaps: Sole‑source or expedited contracts without robust non‑training and portability language create long‑term liabilities. All procurements should include explicit clauses protecting the state’s data and audit rights.
  • Operational cost creep: Pilots can demonstrate benefit while hiding recurring costs (indexing, seats, token consumption). Build FinOps monitoring into pilot design and procurement language.
  • Human‑factor dependence: Training reduces misuse but does not eliminate cognitive shortcuts that lead to blind trust. Implement mandatory verification workflows for outputs that affect citizens directly.

Conclusion​

The Treasurer’s office has done what many governments should: experiment quickly and cautiously, document outcomes, and keep an open mind about vendor options while emphasizing training and governance. That posture has produced promising productivity signals and useful process discoveries. The next stage — if the office intends to scale — must be governed by hardened procurement clauses, independent verification of claimed benefits, strict data‑classification policies, auditable logging, and a transparent public accountability framework.
Done well, these pilots can convert into meaningful service improvements for taxpayers: faster case resolution, more complete recovery of unclaimed assets, and better‑informed fiscal oversight. Done poorly, a push to scale could expose the state to legal, privacy, and fiscal surprises. The responsible route is clear: measure rigorously, contract tightly, and build governance and workforce capability in parallel with any expansion of AI access.

Source: State Affairs Treasurer Briner Highlights AI Efforts Heading into New Year
 

Back
Top