tool · pricing as of 2026-05

LLM cost calculator.
Claude, GPT, Gemini, open-source. Same workload, side-by-side.

Estimate monthly LLM spend across the model families our clients ask about most. Enter your monthly request volume and average input/output tokens, pick a model, and read the side-by-side table to see relative cost. All math runs client-side on list-API pricing dated 2026-05.

▸ the calculator

Enter your workload.
Read the comparison table.

Pick a primary model. Enter your monthly request volume and your typical input/output tokens per request. The table below recomputes monthly spend for every model on the same volume.

est. monthly cost · primary model $0.00
Model Provider Input $/1M Output $/1M Est. monthly cost

prices as of 2026-05 · list API rates · rows flagged "verify" are price bands; confirm against provider pricing pages before quoting

scope

What this calculator covers.
And what it deliberately does not.

A calculator is only useful if you know what it ignores. We list both halves so you can decide where to dig further before sizing a budget.

What it covers

Per-1M-token list-API pricing for input and output across Claude, GPT, Gemini, and a hosted open-source baseline. Monthly cost at your input/output volume.

Same-prompt comparison

Every model is priced at the same input and output token volume. Reading cost without quality is a trap; pair this with an eval on your own corpus.

What it does not cover

Negotiated enterprise rates, prompt-caching discounts, batch-API discounts, fine-tuning surcharges, image or audio tokens, or self-hosted GPU economics.

Reasoning-model caveat

o1 and o3 bill internal reasoning tokens as output. Real spend often runs 2-5x the naive estimate. We flag the rows; budget accordingly.

▸ cost-engineering principles

How we think about model cost. On client engagements, in our own engineering.

Cost is one number on a rubric, not the rubric. Six principles we apply when we size a workload, switch providers, or push a cost-reduction sprint.

  • Cost ≠ quality

    A model that's 10x cheaper but fails 30% of your eval is more expensive in incident response, not less.

  • Dated runs

    Prices change. Quality changes. Always cite both on the same dated axis so the comparison is honest.

  • Output dominates

    Output tokens are 4-5x more expensive than input on most models. Cutting verbose responses beats switching providers.

  • Caching + batch first

    Prompt-caching and batch APIs cut effective cost 50-90% on the right workloads before you change models.

  • P95, not average

    Tail latency drives user experience and retries. A cheaper model that times out gets retried and stops being cheaper.

  • Verify before quoting

    Provider list prices move. Confirm against the provider's current public pricing page before quoting a number in a deck.

Services this cost calculator feeds: Claude development (Sonnet 4.6 / Haiku 4.5 token pricing), OpenAI development (GPT-5 / GPT-5-mini), AI chatbot development (per-turn cost projection), AI agent development (per-task cost on multi-step loops), Intelligent document processing (per-page vision-model cost), and AI voice agents (per-call cost on Realtime + chained voice stacks).

next step

Run the eval, then size the cost.
Same rubric, your corpus, audit-driven.

A calculator gets you a back-of-envelope number. An eval on your own corpus tells you which model actually clears the bar. We run both together on every audit so the cost decision lands on real data, not list prices. Engagements run as a discovery audit, then a 4-6 week pilot with weekly eval gates, then continuous delivery against the same rubric in production. Beyond the calculator — <a href='/services/ai-consulting/'>strategy phase / consulting</a> is the audit door where the cost projection becomes a real budget.