AI case studies.
Receipts, not slideware.
Seven production engagements you can verify. Each one ships with a frozen eval set, a published latency budget, a defined kill point, and the math behind the metric — clinical triage on Claude Sonnet 4.6, RAG over 12,000 product-docs pages, OpenAI Realtime voice at a published $0.10 per call, fraud disposition at a US mid-market bank, first-pass MSA review for a law firm, a tier-1 customer service chatbot deflecting 42% at week 8, and a Flutter voice copilot live in a 1.4M-MAU app. Client names are changed at their request; every number on this page is drawn from shadow-mode logs, frozen eval sets, or 30+ day A/B tests on shipped systems.
What does the GetWidget case studies catalog cover?
The GetWidget case studies catalog covers 7 production AI deployments with published eval data: a HIPAA-safe clinical triage agent (Claude Sonnet 4.6, n=14,200 shadow encounters, 38-62% wait reduction); a Claude RAG over 12,000 product-docs pages (n=3,400, 64% tier-1 deflection); an OpenAI Realtime voice agent (n=11,400 calls, 38% deflection at $0.10/call); a Claude fraud-disposition agent at a US mid-market bank (precision at or above 0.96 at 1% FPR, plus or minus 0.012 CI); a LangChain MSA contract reviewer (n=180, 71% partner time saved); a Flutter voice copilot in a 1.4M-MAU app (n=42,318 sessions, +11.4 percentage-point conversion lift); and a tier-1 customer-service chatbot (42% deflection at $800/mo). Every case study publishes sample size, confidence interval, named stack, and compliance regime, with audit dataset, model version, and eval methodology linked from each case page. Per-case citation cards are at /api/citation-card/:slug. Engagement begins with a fixed-fee discovery audit, then a 4-6 week pilot (fixed-bid with walk-away clause), then continuous monthly delivery scaled to workflow count.
Seven published cases — pick yours.
HIPAA-safe clinical triage agent, shipped in 9 weeks
Pre-triage queue 38–62 min at peak. Nurse line overflow routing wrong-acuity patients to ER.
- Claude Sonnet 4.6
- pgvector 0.7
- FHIR R4
- LangGraph 0.2
Claude RAG over 12,000 product-docs pages
Doc search rated 2.3/5; 41% of support tickets were docs-recoverable. Keyword search couldn't reason across modules.
- Claude Sonnet 4.6
- Haiku 4.5
- pgvector
- bge-reranker
OpenAI Realtime API voice agent at $0.10/call
Tier-1 voice queue 4-min wait at peak; 5 questions = 62% of volume. IVR bouncing 80% to human.
- gpt-realtime-2
- Whisper-large-v3
- pgvector
- Twilio Voice
Claude Sonnet 4.6 fraud agent at a US mid-market bank
Rules-engine bleeding 18% false-positive rate on 1.2B/yr transactions across card, wire, ACH, RTP.
- Claude Sonnet 4.6
- Haiku 4.5
- pgvector
- XGBoost
First-pass MSA review for a mid-market law firm
Partners spending 6–9 hours per MSA on first-pass review; clause-library drift across 4 practice groups.
- Claude Sonnet 4.6
- LangChain
- LangGraph
- pgvector
Tier-1 customer service chatbot: 42% deflection in 8 weeks
Zendesk queue 6-hr FRT at peak. Tier-1 tickets burning cycles. Off-the-shelf chatbots failed on tone + product depth.
- Claude Sonnet 4.6
- Haiku 4.5
- pgvector
- Zendesk API
Flutter voice copilot in a DTC apparel app
Mobile-app conversion lagging desktop by 18pp on a 1.4M-MAU Flutter app. Two prior voice A/B tests failed.
- gpt-realtime-2
- Flutter 3.24
- GetWidget OSS
- Algolia
Six dimensions, on every page, not just the ones that look good.
The 'we measured X' line in most case studies hides three other measurements that didn't move. We publish all six. If one is missing on a case-study page below, it's because the client asked us not to publish it — never because the number was bad.
-
Groundedness
Fraction of answers traceable to a retrieval span (RAGAS). Hallucination's inverse.
-
p95 latency
First-token AND full-reply, per channel. Voice has a different budget than web chat.
-
$ / unit
Per turn, per call, per MSA. Published with the formula — not hand-waved.
-
Eval pass %
Frozen golden set + regression-gated in CI. Drift catches us before it catches the user.
-
Walk-away
The single metric we'd kill the pilot for if it doesn't move. Defined before week 1.
-
Audit log
Every call, every retrieval, every tool invocation logged for replay and dispute.
Why most names are changed,
and how to get a named reference.
Three reasons clients stay anonymous. We share 2–3 named references under NDA inside the audit call, and co-publish a fully named case with a client roughly once a quarter.
Naming clients tips off their competitors
Healthcare, fintech and law-firm clients commonly sit inside a window where a public reference helps a rival decide where to invest next. Naming them is a strategic gift we won't make on a marketing page.
HIPAA, FFIEC and privilege-aware buyers gate references
Named references typically require a paid intro call with counsel or compliance present. We respect that. One regulated client trusting us for a decade beats a logo on a landing page.
The eval table is more useful than the brand name
A buyer should be able to tell from the case study alone whether we picked the right model, whether retrieval is defensible, and whether cost math closes. We share 2–3 named references under NDA inside the audit call.
Questions case-study readers ask most.
Real answers, no hedging.
Why are most of these case studies anonymized?
Are these numbers real or capability examples?
Can I cite GetWidget in my own deck or earnings call?
Do you publish negative results?
What's the typical pilot length for a case-study-shaped engagement?
Why don't you have more case studies in [my industry]?
Want a case study like this
for your stack?
Book a free audit. We review your highest-ROI candidate workflow, recommend a model + retrieval recipe, project token + run-cost, and tell you whether it's case-study-shaped (or whether you should buy an off-the-shelf platform). No deck, no obligation to build.
From the case studies
back to the pillars.
Each case feeds back into a service or industry pillar. Start anywhere.
Services hub
Every service pillar — AI development, agents, voice, chatbots, RAG, governance.
Industries hub
By vertical: healthcare, legal, fintech, ecommerce, education, manufacturing, more.
AI Agent Development
Where the fraud, triage, and contract-review agents live as a service playbook.
AI Voice Agents
Sub-second voice agents on gpt-realtime-2 — the playbook behind the $0.10/call case.
Open source
The Flutter UI kit (4.8k★) behind the GFVoiceCopilot widget in the ecommerce case.