LLM cost calculator.
Claude, GPT, Gemini, open-source. Same workload, side-by-side.
Estimate monthly LLM spend across the model families our clients ask about most. Enter your monthly request volume and average input/output tokens, pick a model, and read the side-by-side table to see relative cost. All math runs client-side on list-API pricing dated 2026-05.
Enter your workload.
Read the comparison table.
Pick a primary model. Enter your monthly request volume and your typical input/output tokens per request. The table below recomputes monthly spend for every model on the same volume.
| Model | Provider | Input $/1M | Output $/1M | Est. monthly cost |
|---|
prices as of 2026-05 · list API rates · rows flagged "verify" are price bands; confirm against provider pricing pages before quoting
What this calculator covers.
And what it deliberately does not.
A calculator is only useful if you know what it ignores. We list both halves so you can decide where to dig further before sizing a budget.
What it covers
Per-1M-token list-API pricing for input and output across Claude, GPT, Gemini, and a hosted open-source baseline. Monthly cost at your input/output volume.
Same-prompt comparison
Every model is priced at the same input and output token volume. Reading cost without quality is a trap; pair this with an eval on your own corpus.
What it does not cover
Negotiated enterprise rates, prompt-caching discounts, batch-API discounts, fine-tuning surcharges, image or audio tokens, or self-hosted GPU economics.
Reasoning-model caveat
o1 and o3 bill internal reasoning tokens as output. Real spend often runs 2-5x the naive estimate. We flag the rows; budget accordingly.
Common workload shapes.
Click a card to pre-populate the calculator.
Four workload shapes we see often on client engagements. Each card loads a realistic request volume and token profile into the calculator above so you can sanity-check the comparison table on a workload close to yours.
Customer support RAG
500K queries/month. ~3,000 input tokens (retrieved context + user message), ~400 output tokens per query. Click to load.
Internal coding agent
50K queries/month. ~8,000 input tokens (file context + prompt), ~1,500 output tokens per query. Click to load.
High-volume embeddings
10M documents/month, embedding pass only. ~500 input tokens per doc, output is negligible. Click to load.
Daily report summariser
5K runs/month. ~20,000 input tokens (long-form source), ~800 output tokens per run. Click to load.
How we think about model cost. On client engagements, in our own engineering.
Cost is one number on a rubric, not the rubric. Six principles we apply when we size a workload, switch providers, or push a cost-reduction sprint.
-
Cost ≠ quality
A model that's 10x cheaper but fails 30% of your eval is more expensive in incident response, not less.
-
Dated runs
Prices change. Quality changes. Always cite both on the same dated axis so the comparison is honest.
-
Output dominates
Output tokens are 4-5x more expensive than input on most models. Cutting verbose responses beats switching providers.
-
Caching + batch first
Prompt-caching and batch APIs cut effective cost 50-90% on the right workloads before you change models.
-
P95, not average
Tail latency drives user experience and retries. A cheaper model that times out gets retried and stops being cheaper.
-
Verify before quoting
Provider list prices move. Confirm against the provider's current public pricing page before quoting a number in a deck.
Services this cost calculator feeds: Claude development (Sonnet 4.6 / Haiku 4.5 token pricing), OpenAI development (GPT-5 / GPT-5-mini), AI chatbot development (per-turn cost projection), AI agent development (per-task cost on multi-step loops), Intelligent document processing (per-page vision-model cost), and AI voice agents (per-call cost on Realtime + chained voice stacks).