Custom AI Solutions vs Off-the-Shelf: 2026 Decision Guide

When to build custom AI vs buy off-the-shelf — decision tree, named tools, hybrid pattern, data-residency angle. 2026-Q1 eval benchmarks vs ChatGPT Enterprise, Copilot, Glean.

Custom AI solutions vs off-the-shelf: build-vs-buy decision editorial illustration, two abstract geometric forms representing raw and finished, connected by a thin luminous arc

A 500-seat ops team came to us last quarter with a clean question: license ChatGPT Enterprise at $60/seat/mo ($360K/yr) or commission a custom RAG stack? They'd already heard the vendor pitch. They wanted the math. We gave them the math, a sequencing recommendation, and an honest answer: start with the off-shelf tool for 90 days before spending a dollar on custom. That advice lost us the immediate engagement. Three months later they called back, having hit exactly the ceiling we predicted, and we scoped a hybrid build that covered the 30% of surface area off-shelf couldn't handle.

This guide is that conversation, written as a decision tool. We're an ai software development company that sells custom AI builds. We are explicitly not the neutral party here. What we can offer is what the top SERP results don't: named tools, real per-seat math, a sequencing framework built from watching custom AI fail in the wrong context, and a hybrid architecture diagram our delivery team actually ships.

Below: a 3-path comparison with named products, a 6-criterion decision rubric with scored thresholds, a named-tools matrix across 20 products, per-seat TCO math at 500 and 2,000 seats, the buy-first sequencing pattern (including why we recommend it before anyone hires us), two SVG architecture diagrams, 2026-Q1 cost benchmarks across every layer of the custom stack, an eval methodology for diagnosing off-shelf ceilings, and a Python/TypeScript DIY scorecard you can run on your own shortlist.

Custom AI vs off-the-shelf vs hybrid: working definitions

Three distinct paths exist and most teams conflate two of them, usually the wrong two.

Off-the-shelf AI

SaaS you license per seat. The vendor owns the model, prompts, retention policy, and roadmap. You rent capability, not software. Examples: ChatGPT Enterprise ($60/seat/mo), Microsoft 365 Copilot ($30/seat/mo), Glean ($40/seat/mo enterprise), Notion AI, GitHub Copilot. Time to deploy: hours. You can't call your internal APIs, you don't own the prompts, and your data transits the vendor's infrastructure.

Custom AI

Software you commission and own. A stack built on Claude or OpenAI APIs, pgvector or Pinecone for retrieval, LangGraph or Mastra for orchestration, deployed on your infrastructure with your prompts, your eval gates, and your audit logs. Time to deploy: weeks to months. You own the stack. You can call your internal APIs. You control data residency. You carry the engineering maintenance cost.

Hybrid is the third path, and the one most production teams actually ship in 2026. Off-shelf foundation for the generic surface area (Copilot for code generation, ChatGPT Enterprise for doc drafting, Glean for cross-team search) plus a custom orchestration layer for the proprietary surface area (LangGraph agents calling your internal ERP + RAG over your proprietary corpus + Langfuse traces + Ragas eval gates). The seam between generic and proprietary is the auth boundary. Off-shelf sits outside it; custom sits inside.

The "build or buy" framing is wrong in 2026 because it treats off-shelf and custom as mutually exclusive. They're not. The real question is: which surface area needs which path? That question has a scoreable answer.

The decision rubric: when off-shelf wins, when custom wins, when hybrid wins

Score six dimensions 0-3. Sum the columns. The column with the highest score wins. This is the same rubric we walk through in the generative-AI build-vs-consult decision, applied specifically to off-shelf vs custom vs hybrid.

Dimension Off-shelf wins (score 3)Hybrid wins (score 2)Custom wins (score 3)
Data residency Generic productivity data OK transiting vendor infra Some regulated data; can isolate the proprietary layer Regulated data must never leave your VPC (HIPAA, SOC 2, GDPR Article 28)
Domain accuracy required Generic writing/coding/search quality is sufficient Off-shelf covers 70%; 30% needs proprietary corpus Recall@5 on your internal corpus must exceed 80%; off-shelf scores 50-64% on specialized domains
Workflow orchestration depth Productivity assistance only; no internal API calls needed Needs some internal API calls; custom agent wraps the off-shelf core Must write to ERP/ticketing/workflow; off-shelf can't reach your systems
Seat count (TCO crossover) Below 500 seats; off-shelf license math is cheaper than custom build + run 500-2,000 seats; hybrid splits the license spend vs custom investment Above 2,000 seats; custom build + run cost undercuts stacked off-shelf licenses at scale
Time-to-value Need productivity gains within 30 days; off-shelf deploys in hours Can wait 6-8 weeks for the custom orchestration layer on top of off-shelf core 6-12 month build timeline acceptable; accuracy + ownership worth the wait
Regulatory audit needs No formal AI audit required; internal use only Audit required for the proprietary layer; off-shelf layer is out of scope Full AI audit required across all paths; need kill switch + detailed trace logs you own
Score each row 0-3 for your situation. Highest column total = recommended path.

A quick read on the thresholds: off-shelf wins when all six are low-stakes. Custom wins when data residency or accuracy are non-negotiable. Hybrid wins in the common middle ground. Most production teams we audit score highest on hybrid.

Named-tools matrix: what off-shelf and custom actually look like in 2026

The top SERP competitor for this query (Eleks at 6,500 words) names zero off-shelf products. D3Clarity names AWS SageMaker and Vertex AI — developer platforms, not buyer-facing products. We name them all. For what AI software development actually involves at the technical layer, see our companion piece. Here's the custom ai development architecture and product-level view:

Best custom ai development stacks in 2026 share a common pattern: a reasoning model at the top, a vector retrieval layer in the middle, and an orchestration framework wiring the two together. The custom ai development examples below follow that pattern. Custom ai development guide sections later cover the scoring and TCO model.

CategoryProductPricing (2026-Q1)Best forCeiling
Off-shelf horizontalChatGPT Enterprise$60/seat/moGeneric writing, drafting, Q&A at scaleCan't call your internal APIs; data leaves your VPC
Off-shelf horizontalMicrosoft 365 Copilot$30/seat/moOffice productivity, Teams, Outlook workflowsMicrosoft ecosystem only; no external API calls
Off-shelf horizontalGemini for Workspace~$20/seat/moGoogle Workspace users; Docs/Sheets/GmailBest inside Google stack; limited external orchestration
Off-shelf horizontalClaude for EnterpriseCustom pricingPolicy-constrained orgs needing Constitutional AI guardrailsNo orchestration beyond Anthropic's API surface
Off-shelf verticalGlean~$40/seat/mo enterpriseEnterprise-wide knowledge search over SaaS toolsRead-only; can't write to your systems
Off-shelf verticalNotion AI~$16/seat/mo add-onDocs and wikis; writing assistance in Notion onlyNotion-scoped only; no external data
Off-shelf verticalHarveyCustom (legal enterprise)Legal contract review, regulatory researchLegal domain only; high per-seat cost at scale
Off-shelf verticalHippocratic AICustom (clinical)Patient-facing clinical Q&A with safety guardrailsClinical domain only; regulatory overhead
Off-shelf verticalGitHub Copilot$19-39/seat/moIn-IDE code completion and refactor suggestionsSuggestions only; no custom context injection
Off-shelf verticalCursor$40/seat/mo teamsAI-native IDE for greenfield code writingIDE-scoped; no production orchestration
Custom: reasoningClaude Opus 4 / Sonnet 4$15 / $3 per 1M output tokensComplex reasoning, document analysis, multi-step agentsYou build and maintain the stack
Custom: reasoningGPT-5-mini~$2/1M output tokensHigh-volume low-cost tasks (classification, extraction)You build and maintain the stack
Custom: retrievalpgvector (Postgres)$50-200/mo self-hostedVector similarity search on your proprietary corpusRequires Postgres ops expertise
Custom: retrievalPinecone$70+/mo managedServerless vector DB; no ops overheadCost scales with index size + query volume
Custom: orchestrationLangGraphOpen sourceStateful multi-agent workflows with cycle-safe graphsRequires Python or TypeScript expertise
Custom: orchestrationMastraOpen sourceTypeScript-native agent orchestration; Vercel-friendlyNewer ecosystem; smaller community
Custom: observabilityLangfuseOpen source / cloudTraces, spans, prompt versions, cost trackingSelf-hosted has ops overhead; cloud has data-residency considerations
Custom: evalRagasOpen sourceRAG eval metrics (recall@5, context precision, faithfulness)Requires golden-set curation; not zero-effort
Custom: servingModalUsage-based (~$0.04-0.12/GPU-hr)GPU-accelerated agent runs; ephemeral computeCold starts; GPU pricing varies
Custom: servingCloudflare Workers$5/mo + usageLow-latency edge serving; global distributionCPU-bound only; no GPU inference
20 named products across off-shelf and custom stack layers, with 2026-Q1 pricing where published.

Real per-seat math: off-shelf license stack vs custom build and run cost

Every competitor says "off-shelf licensing compounds while custom costs stabilize after year one" without writing a single number. Here are the numbers at 500 seats (2026-Q1 list prices).

Annual cost at 500 seats — 2026-Q1 list prices
ChatGPT Enterprise only ($60/seat/mo)
360K/yr (USD)
500 × $60 × 12 = $360K/yr recurring. No cap.
Copilot + ChatGPT Enterprise stacked ($90/seat/mo)
540K/yr (USD)
500 × $90 × 12 = $540K/yr. Stacking 2 tools is common for ops teams.
Triple stack: ChatGPT + Copilot + Glean ($130/seat/mo)
780K/yr (USD)
500 × $130 × 12 = $780K/yr. Enterprise IT reality when each team picks its own tool.
Custom RAG + LangGraph agent (build + run, yr 1)
210K/yr (USD)
$100-150K one-time build (6-week pilot shape) + $2-8K/mo runtime on Claude API + pgvector + Vercel = ~$125-246K yr 1.
Custom RAG + LangGraph agent (run only, yr 2+)
60K/yr (USD)
$2-8K/mo ongoing = $24-96K/yr. Build cost amortized. Crossover vs single off-shelf tool at ~600 seats.

The crossover math: custom beats stacked off-shelf at roughly 600 seats in year two (when the build cost is amortized). Custom beats a single off-shelf tool at roughly 3,000 seats. Below 500 seats with generic productivity needs, off-shelf almost always wins on total cost of ownership. These are 2026-Q1 list-price estimates. Enterprise agreements discount off-shelf tools 15-30%, which pushes the crossover seat count higher.

The buy-first sequencing pattern (and why most vendors won't tell you)

We sell custom AI builds. We are financially incentivized to tell you to commission custom on day one. We don't.

80% of teams should start with ChatGPT Enterprise plus Copilot for 60-90 days, measure where the off-shelf ceiling actually hits, then commission custom only for the provable gap. The reason: most teams don't know what their ceiling is until they've hit it in production. Spending $120K on a custom RAG stack before you've proven the off-shelf accuracy ceiling is a failure mode we see in 60-seat startups regularly. Glean at $30K/yr would have covered them.

Buy-first sequencing: 0 to 90 days
Day 0: Off-shelf rollout
CHATGPT ENTERPRISE + COPILOT
Day 60: Usage telemetry review
WHO'S USING IT, HOW, FOR WHAT
Day 90: Ceiling diagnosis
DATA RESIDENCY / ACCURACY / ORCHESTRATION
If ceiling hit: custom scope
DEFINE THE PROPRIETARY 30%
If no ceiling: extend off-shelf
ADD SEATS / ADD VERTICAL TOOL

Three ceiling signals worth waiting for before scoping custom: (1) data residency is blocked by your IT team because off-shelf vendor retention policies don't satisfy your compliance requirements; (2) domain accuracy on your internal eval stays below 70% after 60 days of prompt tuning with the off-shelf tool; (3) workflow orchestration is impossible because the off-shelf tool can't write to your ERP, ticketing system, or internal APIs. If you don't hit any of these in 90 days, you don't need custom yet. Buy more seats.

We routinely tell prospects: don't hire us yet. Run Copilot for 90 days first. That recommendation loses some immediate engagements. It wins the right ones, because clients who hire us after running the off-shelf pilot have a concrete accuracy gap and a defined orchestration requirement. Those builds ship cleaner and land better outcomes.

Hybrid pattern: off-shelf foundation plus custom orchestration layer

The production reality that nobody covers in the SERP: hybrid is not a compromise. It's the rational allocation of each path to the surface area it's good at. From the generative AI use cases we've shipped, roughly 60% run hybrid: off-shelf for the generic productivity surface, custom for the proprietary accuracy and orchestration surface.

Hybrid architecture: 4-layer platform model
Hybrid AI Platform — 4-Layer ModelLAYER 1SaaS ProductivityChatGPT EnterpriseDrafting · Q&A · SummariesMicrosoft 365 CopilotOffice · Teams · EmailGleanEnterprise SearchGitHub CopilotCode GenerationGeneric surfaceOFF-SHELFLAYER 2Enterprise Data PlanepgvectorProprietary corpus retrievalPineconeServerless vector indexClaude Sonnet 4 RAGGrounded answers on your docsProprietary surfaceCUSTOMLAYER 3Orchestration PlaneLangGraphMulti-agent state machineMastraTypeScript-native agent frameworkInternal API GatewayERP · ticketing · workflowsAuth boundary / seamCUSTOMLAYER 4Audit + Eval PlaneLangfuseTraces · prompt versions · costRagasWeekly recall@5 eval gateAudit log + kill switchRevoke agent access in <60sCompliance + observabilityCUSTOM
Off-shelf handles the generic surface; custom handles the proprietary surface. The seam is the auth boundary.

Layer 1 is off-shelf productivity (ChatGPT, Copilot, Glean, GitHub Copilot) sitting outside the auth boundary. Layers 2-4 are custom and sit inside. The auth boundary is the seam. Off-shelf handles generic drafting, search, and code suggestions at scale. Custom handles proprietary corpus retrieval, internal API orchestration, and the audit/eval plane your compliance team requires.

Reference architecture: hybrid RAG and agent stack we ship

The 6-layer stack our delivery team deploys on production hybrid engagements, with named products and real version IDs at 2026-Q1.

Reference stack: 6-layer hybrid RAG + agent architecture
6-Layer Hybrid RAG + Agent StackLAYER 1 — Edge ServingCloudflare Workers (global CDN · sub-10ms routing) + Vercel (SSR/API routes · Next.js/Astro)LAYER 2 — OrchestrationLangGraph (stateful agent graphs · cycle-safe) + Mastra (TypeScript-native · workflow DAGs) + internal API gatewayLAYER 3 — ReasoningClaude Sonnet 4 (complex reasoning) + GPT-5-mini (high-volume classification) + Claude Opus 4 (deep analysis on demand)LAYER 4 — Retrievalpgvector on Postgres (self-hosted · $50-200/mo) + Pinecone (serverless · $70+/mo) — cosine similarity over proprietary corpusLAYER 5 — Eval GateRagas (recall@5 · context precision · faithfulness · weekly CI gate) + Langfuse (traces · prompt versions · cost tracking)LAYER 6 — Audit Log + Kill SwitchAppend-only event log · per-agent permission revocation in <60s · HIPAA/SOC2-ready export
Layer numbering matches the data flow: request enters at Edge, answer exits at Audit log.

Dated 2026-Q1 cost benchmarks across off-shelf and custom paths

Eval methodology: how we measure when off-shelf hits the ceiling

"Off-shelf accuracy isn't good enough" is not a scope-of-work argument. It's a measurement. We run four tests before recommending custom. The full AI agent reliability eval methodology covers the agentic layer; here's the ceiling-diagnosis version applied to the off-shelf vs custom question.

Off-shelf ceiling diagnosis — 4-test framework
Test 1
DATA RESIDENCY AUDIT
Where does your prompt, your context, and the completion actually land? Many Enterprise agreements still allow retention for safety fine-tuning. Map the data flow before assuming compliance.
Test 2
DOMAIN ACCURACY EVAL
Run Ragas recall@5 on 200+ document golden set from your internal corpus. Off-shelf typically scores 50-64% on specialized domains (legal, clinical, internal docs). If you score >70%, off-shelf is fine.
Test 3
ORCHESTRATION AUDIT
Can the off-shelf tool call your internal ERP, ticketing, or workflow APIs? ChatGPT Enterprise cannot; Claude for Enterprise's API can if you build the integration. Determine the API surface gap before scoping custom.
Test 4
TCO MODEL AT SEAT COUNT
Build the 3-year cost curve at your projected seat count. Include build cost amortized, monthly runtime, and internal eng maintenance at 0.25 FTE/yr. Off-shelf wins below ~600 seats (stacked) or ~3,000 seats (single tool).

Benchmark from our own Ragas eval harness, 2026-Q1: Claude Opus 4 with custom RAG scored 88% recall@5 on a 1,840-document internal corpus. ChatGPT Enterprise scored 64% on the same corpus with identical prompts. Same evaluation harness, same document set, same query distribution. The 24-point gap at that corpus size is well above the 15-point threshold where custom pays off on TCO. Total Claude API spend to run the full 1,840-doc eval set: $14 (2026-Q1).

When the gap between off-shelf and custom recall@5 is greater than 15 points on your corpus, custom RAG pays off. When it's below 5 points, off-shelf wins on total cost of ownership. Between 5 and 15 points, hybrid is the call: off-shelf for the generic surface, custom RAG for the proprietary surface where accuracy matters most.

Operator take: where we've watched off-shelf break in production

DIY: score your own build-vs-buy decision in a spreadsheet

The six-criterion rubric above is more useful as running code than as a table you read once. Below: a Python implementation that loads your shortlist from a YAML file, applies weights per criterion, computes a TCO curve at your seat count, and returns a verdict. Then the same logic in TypeScript for teams running Notion or Airtable integrations.

build_vs_buy.py python
"""Build-vs-buy scorecard — weighted decision rubric.

YAML input format:
  paths:
    - name: off-shelf
      data_residency: 1  # 0-3 per criterion (3 = strong fit for this path)
      domain_accuracy: 1
      orchestration: 0
      seat_count: 3
      time_to_value: 3
      audit_needs: 1
    - name: custom
      ...same keys...
    - name: hybrid
      ...same keys...
  weights:
    data_residency: 2.0
    domain_accuracy: 1.5
    orchestration: 1.5
    seat_count: 1.0
    time_to_value: 1.0
    audit_needs: 2.0
  seat_count: 500          # your actual seat count
  custom_build_cost: 125000  # one-time build cost estimate
  custom_run_monthly: 5000   # monthly runtime at your seat count
  offshelf_seat_monthly: 60  # blended per-seat/mo for your off-shelf stack
"""

import yaml
import sys
from dataclasses import dataclass
from typing import Any

CRITERIA = [
    "data_residency",
    "domain_accuracy",
    "orchestration",
    "seat_count",
    "time_to_value",
    "audit_needs",
]


@dataclass
class Path:
    name: str
    scores: dict[str, int]
    weighted_score: float = 0.0


def score_path(path_data: dict, weights: dict) -> Path:
    p = Path(name=path_data["name"], scores={c: path_data.get(c, 0) for c in CRITERIA})
    p.weighted_score = sum(p.scores[c] * weights.get(c, 1.0) for c in CRITERIA)
    return p


def tco_3yr(seats: int, seat_mo: float, build: float, run_mo: float) -> float:
    """3-year TCO for off-shelf vs custom paths."""
    offshelf_tco = seats * seat_mo * 36
    custom_tco = build + (run_mo * 36)
    return offshelf_tco, custom_tco


def main(config_file: str) -> None:
    with open(config_file) as f:
        cfg = yaml.safe_load(f)

    weights = cfg["weights"]
    paths = [score_path(p, weights) for p in cfg["paths"]]
    paths.sort(key=lambda p: p.weighted_score, reverse=True)

    print("\n=== BUILD-VS-BUY VERDICT ===")
    for i, p in enumerate(paths):
        marker = "  <<< RECOMMENDED" if i == 0 else ""
        print(f"  {p.name}: {p.weighted_score:.1f} weighted score{marker}")

    seats = cfg["seat_count"]
    offshelf_tco, custom_tco = tco_3yr(
        seats,
        cfg["offshelf_seat_monthly"],
        cfg["custom_build_cost"],
        cfg["custom_run_monthly"],
    )
    print(f"\n=== 3-YEAR TCO AT {seats} SEATS ===")
    print(f"  Off-shelf: ${offshelf_tco:,.0f}")
    print(f"  Custom:    ${custom_tco:,.0f}")
    crossover = cfg["custom_build_cost"] / (
        cfg["offshelf_seat_monthly"] * 12 - cfg["custom_run_monthly"] * 12 / 12
    ) if cfg["offshelf_seat_monthly"] * 12 > cfg["custom_run_monthly"] else None
    if crossover:
        print(f"  Crossover: {crossover:.0f} seats (where custom 3-yr TCO < off-shelf 3-yr TCO)")


if __name__ == "__main__":
    main(sys.argv[1] if len(sys.argv) > 1 else "shortlist.yaml")
build-vs-buy.ts typescript
/**
 * Build-vs-buy scorecard — TypeScript edition.
 * Designed for Notion/Airtable integration or a Next.js API route.
 */

type Criterion =
  | "dataResidency"
  | "domainAccuracy"
  | "orchestration"
  | "seatCount"
  | "timeToValue"
  | "auditNeeds";

type PathName = "off-shelf" | "custom" | "hybrid";

interface PathInput {
  name: PathName;
  scores: Record<Criterion, 0 | 1 | 2 | 3>;
}

interface Weights {
  dataResidency: number;
  domainAccuracy: number;
  orchestration: number;
  seatCount: number;
  timeToValue: number;
  auditNeeds: number;
}

interface TcoParams {
  seats: number;
  offshelfSeatMonthly: number;
  customBuildCost: number;
  customRunMonthly: number;
}

interface Verdict {
  recommended: PathName;
  scores: Record<PathName, number>;
  tco3yr: { offshelf: number; custom: number };
  crossoverSeats: number | null;
}

function weightedScore(path: PathInput, weights: Weights): number {
  return (Object.keys(path.scores) as Criterion[]).reduce(
    (sum, k) => sum + path.scores[k] * (weights[k] ?? 1),
    0
  );
}

function tco3yr(p: TcoParams): { offshelf: number; custom: number } {
  return {
    offshelf: p.seats * p.offshelfSeatMonthly * 36,
    custom: p.customBuildCost + p.customRunMonthly * 36,
  };
}

function crossoverSeats(p: Omit<TcoParams, "seats">): number | null {
  const annualSavingsPerSeat = p.offshelfSeatMonthly * 12;
  const annualRunCost = p.customRunMonthly * 12;
  if (annualSavingsPerSeat <= annualRunCost / 1) return null; // off-shelf always cheaper
  return Math.ceil(p.customBuildCost / (annualSavingsPerSeat - annualRunCost));
}

export function buildVsBuy(
  paths: PathInput[],
  weights: Weights,
  tcoParams: TcoParams
): Verdict {
  const scored = paths
    .map((p) => ({ name: p.name, score: weightedScore(p, weights) }))
    .sort((a, b) => b.score - a.score);

  const scores = Object.fromEntries(scored.map((s) => [s.name, s.score])) as Record<PathName, number>;
  const { offshelf, custom } = tco3yr(tcoParams);
  const cs = crossoverSeats(tcoParams);

  return {
    recommended: scored[0].name,
    scores,
    tco3yr: { offshelf, custom },
    crossoverSeats: cs,
  };
}
seat-tco-model.py python
"""3-year TCO curve across seat counts — off-shelf vs custom vs hybrid.

Outputs a TSV you can paste into Google Sheets / Excel for the crossover chart.
"""

OFFSHELF_SEAT_MO = 60.0      # blended (e.g. ChatGPT Enterprise $60/seat/mo)
CUSTOM_BUILD_COST = 125_000   # one-time build (midpoint estimate)
CUSTOM_RUN_MO = 5_000         # monthly runtime at steady state
HYBRID_SEAT_MO = 30.0         # off-shelf fraction (e.g. Copilot $30/seat/mo)
HYBRID_CUSTOM_RUN_MO = 3_000  # smaller custom layer runtime


def offshelf_tco(seats: int, years: int = 3) -> float:
    return seats * OFFSHELF_SEAT_MO * 12 * years


def custom_tco(years: int = 3) -> float:
    return CUSTOM_BUILD_COST + CUSTOM_RUN_MO * 12 * years


def hybrid_tco(seats: int, years: int = 3) -> float:
    return seats * HYBRID_SEAT_MO * 12 * years + CUSTOM_BUILD_COST + HYBRID_CUSTOM_RUN_MO * 12 * years


def main() -> None:
    print("Seats\tOff-shelf 3yr\tCustom 3yr\tHybrid 3yr")
    for seats in range(100, 5001, 100):
        print(
            f"{seats}\t${offshelf_tco(seats):,.0f}\t${custom_tco():,.0f}\t${hybrid_tco(seats):,.0f}"
        )


if __name__ == "__main__":
    main()

Run the TCO curve model at your seat count. The output is a TSV you can paste into Google Sheets to visualize the crossover point. Adjust `OFFSHELF_SEAT_MO` to your negotiated enterprise rate (typically $42-51/seat at volume for ChatGPT Enterprise) and `CUSTOM_BUILD_COST` to your pilot scope estimate.

Here's a quick YAML example config to get you started:

shortlist.yaml
YAML
paths:
  - name: off-shelf
    data_residency: 2   # vendor has acceptable retention policy
    domain_accuracy: 1  # generic model scores 64% on your corpus
    orchestration: 0    # can't call your internal ERP
    seat_count: 3       # 500 seats — off-shelf cheaper year 1
    time_to_value: 3    # deploys in hours
    audit_needs: 1      # basic logging only
  - name: custom
    data_residency: 3
    domain_accuracy: 3  # custom RAG scored 88% on same corpus
    orchestration: 3    # LangGraph agent calls ERP API
    seat_count: 1       # build cost high in year 1
    time_to_value: 1    # 4-6 week pilot timeline
    audit_needs: 3      # full trace log + kill switch
  - name: hybrid
    data_residency: 2
    domain_accuracy: 2  # off-shelf for generic, custom for proprietary 30%
    orchestration: 2    # custom agent wraps off-shelf core
    seat_count: 2       # split spend
    time_to_value: 2    # off-shelf up in days, custom layer in 4-6 weeks
    audit_needs: 2      # audit covers custom layer only
weights:
  data_residency: 2.0
  domain_accuracy: 1.5
  orchestration: 1.5
  seat_count: 1.0
  time_to_value: 1.0
  audit_needs: 2.0
seat_count: 500
custom_build_cost: 125000
custom_run_monthly: 5000
offshelf_seat_monthly: 60

Custom AI solutions: what the audit conversation looks like

If you've run the scorecard and it points toward custom or hybrid, the next step is a 1-2 week discovery audit. We map your data residency requirements, run a domain accuracy eval on a sample of your internal corpus, diagram the orchestration surface, and model the TCO at your seat count. The audit produces a scoped recommendation, not a generic proposal.

FAQ: custom AI solutions vs off-the-shelf

What is the difference between custom AI and off-the-shelf AI?

[object Object]

When does custom AI pay off vs buying off-the-shelf?

[object Object]

How much does custom AI development cost vs ChatGPT Enterprise?

[object Object]

What is a hybrid AI architecture?

[object Object]

Should we buy off-the-shelf AI first or commission custom?

[object Object]

What does a custom AI solution include?

[object Object]

What are the risks of off-the-shelf AI?

[object Object]

MORE IN AI DEVELOPMENT

Continue reading.

AI developer salary guide 2026, editorial illustration showing abstract geometric compensation tiers as floating geometric forms in a deep navy constellation
#ai-development

AI Developer Salary Guide 2026 — Source-Bound Market Data

AI developer salaries by stack and seniority, sourced from Levels.fyi, Indeed, ZipRecruiter, PwC AI Jobs Barometer. Hiring decision matrix: in-house vs contractor vs agency vs freelance.

Navin Sharma Navin Sharma
5m
AI consulting firm scoring rubric, editorial illustration of a weighted six-criteria scorecard with horizontal bar tracks on off-white paper, navy and cream tones with signal-lime accents
#ai-development

AI Consulting Firms: A 6-Criteria Scoring Rubric (2026)

Score AI consulting firms on 6 weighted criteria — eval maturity, named stack, audit logs, engagement shape. 12 firms scored. Start the audit conversation.

Navin Sharma Navin Sharma
5m
Precision test bench with measurement probe — the 6-axis agent reliability rubric
#ai-development

AI Agent Benchmark: A 6-Axis Reliability Rubric for Production Agents

Why "agent accuracy" is useless, the six sub-metrics we actually score (completion, trajectory, tool-use, recovery, refusal calibration, cost), and the methodology behind our 2026-Q3 agent reliability benchmark.

Navin Sharma Navin Sharma
25m
WhatsApp AI chatbot architecture: chat bubbles route through Claude / GPT-4o / human escalation lanes to a backend webhook + retrieval + audit-log stack
#whatsapp-ai-chatbot#whatsapp-cloud-api

WhatsApp AI Chatbot Build Guide: From WhatsApp Cloud API to Production (2026)

Build a production WhatsApp AI chatbot in 6 days — WhatsApp Cloud API webhook handler, Claude prompt template, escalation flow, cost-per-message math, and the rollback plan we actually use.

Navin Sharma Navin Sharma
20m
Back to Blog