CITY OF SF · TECH TALK MAY '26

Leverage, not magic:
AI in government.

Keith Kurson
PRESENTED BY Keith Kurson AI Resident · Propel
keith.is
keith@keithkurson.net
BACKGROUND

A short bio,
in five entries.

2004 → PRESENT
Subeta

Virtual pet site I started as a teenager. I emancipated, and that's where I learned how to learn.

2013
Code for America

Year in NYC. Exposed to civic problems at every scale — from a single block to the whole city.

2014 → 2017
Nava PBC

Lead engineer on a rewrite of healthcare.gov, and helped stand up Nava's integrated-benefits practice. I know what it's like to transform a million forms.

2017 → 2024
Glitch → Fastly

Wanted to make a place where anyone could build the internet. Saw the first wave of AI chatbots there. After Fastly's acquisition, started to see what agents browsing the internet look like to a CDN.

2025 → NOW
Propel

AI residency. Responding to HR1 at the state level — SNAP and Medicaid.

THE RESIDENCY · WHAT I DO

A year on one question.

How should states answer HR1 for SNAP and Medicaid — and where does AI actually help the people running those programs?

THE ROLE
Resident, not staff.
A researcher embedded inside Propel — free to poke around, publish, and follow the question.
THE FOCUS
HR1 → state programs.
What changes for SNAP and Medicaid when the policy actually lands at the state level.
THE RESOURCE
Propel, on tap.
The largest benefits app in the US — real users, real data, real ground truth when I need it.
THESIS

AI's value in government is removing friction between people and services.

— but only if you keep your own discipline while building.

OCTOBER 2025 · A RAPID RESPONSE

When the
shutdown hit.

  • Government shutdown.
  • SNAP benefits in limbo.
  • People needed food now.
  • No central, current, machine-readable directory of where to go.
OCTOBER
2025
DAY 1 · SNAP IN LIMBO
> user query, Oct 14:
> "where do i go to get food this week"
RESPONSE

What we built
between Friday & Friday.

STEP ONE
Source
directories
Thousands of orgs, scattered across hundreds of stale sites.
STEP TWO
Multi-layer
AI pipeline
Claude searches, Google Places locates, Jina parses — cross-validated before going live.
STEP THREE
National
food bank DB
Live, validated, machine-readable. In production days later.
Days,
not quarters.
A year ago this would have been an RFP.
Today it's a long week.
METHOD

The web already
has the structure.

  • Jina.ai: URL in, clean markdown out.
  • Follow the natural link graph between directories.
  • Stop thinking scraper. Start thinking reader.
Mindset shift: the web isn't N sites to parse. It's one giant linked document.
RAW HTML · before
<div class="content-wrap"><div class="row"><div class="col-md-8">
<h2 class="heading-primary">St. Anthony's Foundation</h2>
<p class="lead"><span style="font-weight:bold">Address:</span>
150 Golden Gate Ave, SF, CA 94102<br/><span>Hours:</span> M-F
11:30-12:30</p><p>Serves: hot meals, no ID required</p>...
CLEAN MARKDOWN · after
## St. Anthony's Foundation
**Address:** 150 Golden Gate Ave, SF
**Hours:** M–F 11:30–12:30
**Serves:** hot meals, no ID required
→ STRUCTURED, VALIDATED, IN THE DB
SHUTDOWN, PART TWO · STATE NOTICES

Mail wasn't fast enough.

01 · SCRAPE
Every state's
SNAP dept.
Sites, Facebook, Twitter — wherever notices actually showed up.
02 · DELIVER
Straight into
the app.
In each state's own words. We didn't rewrite a thing.
03 · REUSE
Same pipeline,
new policy era.
Now tracking post-shutdown program changes, state by state.
Speed. Not editorial.

Propel as editor of state government is a worse problem than any state's tone.

USER RESEARCH

People would rather talk
than type to a service.

Sounds obvious out loud. Not obvious from how the entire benefits-administration internet is currently built.
HOW WE TESTED · HR1 WORK REQUIREMENTS

Thousands of conversations,
in a couple of weeks.

01 · RECRUIT
Push notif
to the segment
via the Propel app
02 · INTERVIEW
AI voice agent
runs the call
thousands / day
03 · FLAG
Transcripts
scored & sorted
edge cases surface
flagged cases
04 · HUMAN RESEARCHERS
Deep follow-up on flagged cases.
Where the empathy work actually happens. The AI sorts; people listen.
AN UNEXPECTED FINDING

People felt more heard
by the voice bot.

Even knowing it was AI.

More willing to share what was wrong. No feeling of trauma-dumping on another person.

SCOPE OF THE FINDING

What we know vs. what we suspect.

WE KNOW
  • Low-income SNAP recipients
  • English-first
  • Sample meaningful, not huge
  • Strong preference for voice
WE SUSPECT
  • It generalizes — but untested for:
  • Spanish, Cantonese, Vietnamese
  • Older users
  • Users with disabilities
→ IF ANYONE WANTS TO RUN THE PILOTS, I'LL HELP.
CONSEQUENCES · SERVICE DESIGN

The redesign question
changes.

FROM
FORM
household income *
hours worked / week *
other earned income *

"how do we make the form less painful?"

TO
CONVERSATION
so — tell me how this month has been going.
honestly? rough. they cut my hours again.
okay. let's figure out what that means for benefits.

"what does the conversation we'd want to have look like?"

Voice as a default alongside text, never instead of it.
ANTICIPATED CONCERNS

Three questions
you're already asking.

$
Cost.
Per-conversation economics. What does this look like at scale?
!?
Error handling.
Wrong answers about benefits hurt people.
§§
PII & retention.
Recording, access, how long we keep it.
These are the right questions. The next few slides are how we answer them.
HOW WE ANSWER THEM

All three concerns
are really one question.

Can we trust the system?
ANSWER COMES IN THREE LAYERS
LAYER 1
Organizational
The Gateway. PII strip, prompt-injection defense, audit logs.
LAYER 2
Ecosystem
Open protocols. MCPs & Skills work across providers.
LAYER 3
Personal
Workflow discipline. Plan before the agent moves.
THE EMPLOYEE TOUR

The fear is legitimate.

I trained the whole company on Claude this year. I spent the most time with our government team and customer support.

People whose entire job is the currency of trust.

WHAT TRUST-WORK LOOKS LIKE
Correct answers. Defensible answers. Answers in language people can use.
WHAT AI INTRODUCES
A system they can't validate. Asked to trust on faith.

The right answer is not to reassure people. It's to give them structural reasons to trust.

LAYER 1 · ORGANIZATIONAL

The Propel Gateway,
and a paper trail.

USER / APP
THE GATEWAY
PII strip
+ prompt-injection defense
LLM / SOURCES
Workspace · Amplitude · internal data
LOGGED & ATTRIBUTED
AUDIT LOG
every action —
attributable, reviewable, defensible.

That's the language of audit. That's the language government already speaks.

LAYER 2 · ECOSYSTEM

The tooling is portable.

  • MCPs and Skills are open protocols.
  • They work across every frontier LLM.
  • You can switch providers — and keep your investment.
The opposite of the vendor-lock pattern that has burned gov IT for 20 years.
YOUR MCP / SKILL
Claude
GPT
Gemini
…the next one
SAME TOOLING. ANY PROVIDER.
ON TIMING

Yes, it's the
wild west.

The practices are still settling. Granted.

The private sector is already locked into last year's stack & last year's mistakes.

The public sector can skip a generation of bad patterns — but only by engaging now.

LAYER 3 · PERSONAL DISCIPLINE

Plan before
the agent moves.

github.com/obra/superpowers
STEP 01
Design doc
What we're building & why.
✓ HUMAN REVIEW
STEP 02
Impl doc
How, in concrete steps.
✓ HUMAN REVIEW
STEP 03
Agent executes
Against something I can point at & steer.

"The AI did it" is not an answer when you're accountable to the public.

LAYER 3 · IN PRACTICE

Two short docs.
Ten minutes each.

DESIGN DOC
voice-opt-out/design.md
# Voice opt-out
## Why
Some recipients don't want AI in their interactions. Need a fast, obvious opt-out.
## How
One-tap "switch to text" on every voice screen. Honor system-wide.
## Open questions
Default opt-in or opt-out? Partner integrations?
## Out of scope
The voice agent itself. New languages.
What. Why. Edges. Reviewed before any agent moves.
IMPL DOC
voice-opt-out/impl.md
# Voice opt-out — impl
## Approach
Boolean on users.voice_opt_out. Check before any voice path runs.
## Steps
1. migration · add column
2. UI · toggle + inline switch
3. server · requireVoiceConsent()
4. sweep voice helpers · fail closed
5. tests · 0 voice paths for opt-outs
## Verify
Manual run through. Test: opt-out → text everywhere.
How. In order. Reviewed before the agent executes.

Plain markdown. No platform to buy. The discipline lives in the habit of writing them, not in the templates.

LAYER 3 · CONTINUED

Five minutes,
after every task.

github.com/DrCatHicks/learning-opportunities
  • Interrupts the fluency illusion.
  • Pauses after meaningful work.
  • Asks you to sketch the answer first.
  • Refuses to explain until you've tried.
A direct counter to the slow atrophy of senior judgment that comes from over-relying on the tools.
$ learning-opportunities run
▸ Would you like to do a quick learning
exercise on the voice opt-out migration?
▸ Explain this component as if you were
onboarding a new developer.
$ _
WHAT'S NEXT · 01

Evals for contested ground.

  • Most AI evals score against a single right answer.
  • Benefits determinations don't always have one.
  • Same case, three defensible readings: legal aid, eligibility worker, program director.
  • Surface the disagreement. Don't collapse it.
THE WHOLE POINT
When a model says "95% sure," that should mean something.
In benefits work, it mostly doesn't.
WHAT'S NEXT · 02

When the user is the hard part.

  • AI plays a real applicant — guarded, confused, withholding.
  • The chatbot under test has to figure out what's actually going on.
  • Standard eval: does it know the income limits? Easy.
  • Real failures: the ones where the user is the hard part.
THE WHOLE POINT
Before you put AI in front of someone applying for benefits — you better know what it does when the user is confused, guarded, or wrong.
WHAT'S NEXT · 03

Conversation,
not bureaucratic UI.

  • Proof-of-concept layer between AI agents and state benefits portals.
  • Agent web-drives the portal on the user's behalf.
  • One application experience. Any state. From any tool.
> "Claude, open my SNAP renewal for California — tell me what we need to prepare."
CLOSEST TO WHAT SF WOULD ACTUALLY BUILD
PULLING IT TOGETHER

AI is leverage,
not magic.

Use it to reduce friction for the people you serve.
Use it to increase your own discipline — not replace it.

IF YOU START TOMORROW

Four moves,
in this order.

01
Pick one high-friction service interaction.
02
Prototype a voice version of it.
03
Build the audit trail from day one — not as a retrofit.
04
Call me. Genuinely — I'm happy to help.
keith@keithkurson.net
OPEN FLOOR · 22 MIN

Questions?

That was a lot. I left twenty minutes on purpose.

Need a prompt? I have a few I expect.
IF THE ROOM IS SHY
  • Q1 Cost at scale?
  • Q2 Handling hallucination on benefits guidance?
  • Q3 Where SF starts — procurement & vendor evaluation?
  • Q4 Residents who don't want a bot?
  • Q5 Caseworkers & call center staff?
  • Q6 Accessibility — Deaf/HoH, language access?
  • Q7 Data retention & subpoena risk on voice recordings?
KEITH KURSON
keith@keithkurson.net
keith.is