ai engineering · model-agnostic

AI engineering at GetWidget
Eval-first delivery, in public.

We design and ship production AI systems: RAG, agents, voice, document processing, and governance programs. Model-agnostic on principle. Eval-first by default. Open-source where it earns trust, paid where it earns its keep. Operator engineering, not strategy decks.

how we work

Eval-first delivery.
Audit, pilot, continuous — gated on real evals.

Every engagement runs through three phases. Each phase has a measurable exit criterion that the eval suite enforces. No model goes to production without passing the same rubric we publish on our benchmarks.

  1. 01

    1. Discovery audit

    1-2 weeks. We map your current AI surface, pick the highest-impact bet, and write the eval rubric. Ends with a written prioritisation and a go/no-go.

  2. 02

    2. Pilot with weekly eval gates

    4-6 weeks. Working system in production behind a feature flag. Weekly eval gates decide what ships, not vibes. Cost reported alongside quality.

  3. 03

    3. Continuous delivery

    Ongoing. Dedicated engineering team, eval suite versioned with the code, monthly model-selection re-checks, real on-call rotation.

next step

Talk to the engineers who'll build it.
Audit conversation, not a discovery call.

If you already know what you need, the audit is 1-2 weeks and ends with a written prioritisation. If you're earlier than that, we'll tell you that too. Recent pilot runs we ship from: 88% faithfulness on a 1,840-document RAG corpus (2026-Q1); 71% pass@1 across 100 tool-using tasks on the agent harness (2026-Q1).