How do I estimate LLM costs for a production workload?

Multiply your monthly request volume by the average input tokens per request, then again by the average output tokens per request. Multiply each by the model's per-1M-token price and sum. Sample 100 real requests from a staging log to ground the token counts. Budget 20-30% headroom for retries, schema repair, and tool-use loops, which expand real spend beyond the naive estimate.

What is the difference between input and output tokens?

Input tokens are everything you send to the model: system prompt, retrieved context, conversation history, user message, tool definitions. Output tokens are what the model generates back. Output is typically 4-5x more expensive than input on every major provider, so a verbose model with a tight prompt can cost more than a concise model with a sprawling prompt.

Why does cost vary so much between models?

Three reasons. First, hardware: frontier models run on more expensive accelerators at lower batch efficiency. Second, training cost amortisation: a $100M training run gets recovered in per-token margin. Third, market positioning: Anthropic, OpenAI, and Google deliberately tier their lineups so that a 20x price gap signals a quality gap. On a real eval the quality gap is usually smaller than the price gap; that's where eval-driven model selection pays for itself.

Should I optimise for cost or quality first?

Quality first, always. Pick the cheapest model that clears your eval threshold on your real corpus, then apply prompt-caching, batch APIs, output-length controls, and request-level routing to compress cost. Optimising cost on a model that fails your eval just means you fail faster and at scale.

How often do these LLM prices change?

Frontier-model list prices have re-cut every 6-9 months historically, almost always downward. Reasoning-model tiers and 1M-context tiers move more often. This calculator is dated 2026-05; we revise the price table each time a major provider updates a public price. For board-level numbers, verify against the provider's current pricing page on the day you quote.

tool · pricing as of 2026-05

LLM cost calculator.
Claude, GPT, Gemini, open-source. Same workload, side-by-side.

Estimate monthly LLM spend across the model families our clients ask about most. Enter your monthly request volume and average input/output tokens, pick a model, and read the side-by-side table to see relative cost. All math runs client-side on list-API pricing dated 2026-05.

Jump to the calculator How we run cost evals

▸ the calculator

Enter your workload.
Read the comparison table.

Pick a primary model. Enter your monthly request volume and your typical input/output tokens per request. The table below recomputes monthly spend for every model on the same volume.

Model	Provider	Input $/1M	Output $/1M	Est. monthly cost

prices as of 2026-05 · list API rates · rows flagged "verify" are price bands; confirm against provider pricing pages before quoting

scope

What this calculator covers.
And what it deliberately does not.

A calculator is only useful if you know what it ignores. We list both halves so you can decide where to dig further before sizing a budget.

What it covers

Per-1M-token list-API pricing for input and output across Claude, GPT, Gemini, and a hosted open-source baseline. Monthly cost at your input/output volume.

Same-prompt comparison

Every model is priced at the same input and output token volume. Reading cost without quality is a trap; pair this with an eval on your own corpus.

What it does not cover

Negotiated enterprise rates, prompt-caching discounts, batch-API discounts, fine-tuning surcharges, image or audio tokens, or self-hosted GPU economics.

Reasoning-model caveat

o1 and o3 bill internal reasoning tokens as output. Real spend often runs 2-5x the naive estimate. We flag the rows; budget accordingly.

presets

Common workload shapes.
Click a card to pre-populate the calculator.

Four workload shapes we see often on client engagements. Each card loads a realistic request volume and token profile into the calculator above so you can sanity-check the comparison table on a workload close to yours.

preset · support

Customer support RAG

500K queries/month. ~3,000 input tokens (retrieved context + user message), ~400 output tokens per query. Click to load.

preset · coding

Internal coding agent

50K queries/month. ~8,000 input tokens (file context + prompt), ~1,500 output tokens per query. Click to load.

preset · embed

High-volume embeddings

10M documents/month, embedding pass only. ~500 input tokens per doc, output is negligible. Click to load.

preset · summary

Daily report summariser

5K runs/month. ~20,000 input tokens (long-form source), ~800 output tokens per run. Click to load.

▸ cost-engineering principles

How we think about model cost. On client engagements, in our own engineering.

Cost is one number on a rubric, not the rubric. Six principles we apply when we size a workload, switch providers, or push a cost-reduction sprint.

Cost ≠ quality

A model that's 10x cheaper but fails 30% of your eval is more expensive in incident response, not less.
Dated runs

Prices change. Quality changes. Always cite both on the same dated axis so the comparison is honest.
Output dominates

Output tokens are 4-5x more expensive than input on most models. Cutting verbose responses beats switching providers.
Caching + batch first

Prompt-caching and batch APIs cut effective cost 50-90% on the right workloads before you change models.
P95, not average

Tail latency drives user experience and retries. A cheaper model that times out gets retried and stops being cheaper.
Verify before quoting

Provider list prices move. Confirm against the provider's current public pricing page before quoting a number in a deck.

Services this cost calculator feeds: Claude development (Sonnet 4.6 / Haiku 4.5 token pricing), OpenAI development (GPT-5 / GPT-5-mini), AI chatbot development (per-turn cost projection), AI agent development (per-task cost on multi-step loops), Intelligent document processing (per-page vision-model cost), and AI voice agents (per-call cost on Realtime + chained voice stacks).

next step

Run the eval, then size the cost.
Same rubric, your corpus, audit-driven.

A calculator gets you a back-of-envelope number. An eval on your own corpus tells you which model actually clears the bar. We run both together on every audit so the cost decision lands on real data, not list prices. Engagements run as a discovery audit, then a 4-6 week pilot with weekly eval gates, then continuous delivery against the same rubric in production. Beyond the calculator — <a href='/services/ai-consulting/'>strategy phase / consulting</a> is the audit door where the cost projection becomes a real budget.

Read the methodology Start an audit conversation

LLM cost calculator.
Claude, GPT, Gemini, open-source. Same workload, side-by-side.

Enter your workload.
Read the comparison table.

What this calculator covers.
And what it deliberately does not.

What it covers

Same-prompt comparison

What it does not cover

Reasoning-model caveat

Common workload shapes.
Click a card to pre-populate the calculator.

Customer support RAG

Internal coding agent

High-volume embeddings

Daily report summariser

How we think about model cost. On client engagements, in our own engineering.

Cost ≠ quality

Dated runs

Output dominates

Caching + batch first

P95, not average

Verify before quoting

Run the eval, then size the cost.
Same rubric, your corpus, audit-driven.

Talk to an engineer, not a salesperson.

Thanks —
we'll reply within 24 working hours.

LLM cost calculator. Claude, GPT, Gemini, open-source. Same workload, side-by-side.

What this calculator covers. And what it deliberately does not.

What it covers

Same-prompt comparison

What it does not cover

Reasoning-model caveat

Common workload shapes. Click a card to pre-populate the calculator.

Customer support RAG

Internal coding agent

High-volume embeddings

Daily report summariser

Cost ≠ quality

Dated runs

Output dominates

Caching + batch first

P95, not average

Verify before quoting

Run the eval, then size the cost. Same rubric, your corpus, audit-driven.

LLM cost calculator.
Claude, GPT, Gemini, open-source. Same workload, side-by-side.

What this calculator covers.
And what it deliberately does not.

Common workload shapes.
Click a card to pre-populate the calculator.

Run the eval, then size the cost.
Same rubric, your corpus, audit-driven.