ai knowledge base · production

AI knowledge base, shipped.
Not bought.

Production AI knowledge base and ai knowledge management for internal Q&A, agent-assist, and AI enterprise search. RAG-grounded retrieval across Notion, Confluence, Drive, Slack, Zendesk, and your code repo. Eval-first, RBAC-aware, audit-logged. First production knowledge base live in 4 to 6 weeks behind a feature flag, on Claude Sonnet 4.6 and GPT-5 mini routed per stage.

See the anatomy
10,000+
documents grounded on a typical pilot corpus
0.88+
recall@10 baseline before we ship to production
92%
grounding rate on the production eval set
200+
real questions in every eval set we build
ai knowledge base · what it actually means

Six AI knowledge base shapes we ship,
each with its own eval target.

The phrase ai knowledge base gets used for six different products. We ship all six. Each shape has a different retrieval surface, different eval target, and different rollout risk. Pick the shape your audience actually needs before scoping the pilot.

Internal-docs RAG: the employee AI knowledge base

The crown-jewel internal use case. RAG-grounded retrieval over policy docs, HR handbooks, onboarding guides, and engineering wikis. Employees ask in Slack or a web portal; the AI knowledge base answers from your docs with a citation, or refuses. RBAC scoped against your identity provider so finance can't see legal-only sources and vice versa.

Notion · Drive · Sonnet 4.6

AI agent assist over support tickets

Sits next to a human support agent in Zendesk or Intercom. Retrieves from past tickets, the help center, and product docs; drafts the reply the agent edits and sends. The agent assist AI pattern lifts handle time roughly 25–40% without removing the human from the loop. Eval set built from your last-90-day ticket archive.

Zendesk · Intercom · Sonnet 4.6

Employee Q&A on policies and finance

AI knowledge management for the highest-risk surface: HR policy, finance procedures, legal precedent. RBAC scoped at query time. Refusal-by-default when retrieval misses. Audit log every query for compliance review. We ship this with a 0.95+ grounding rate or we don't ship it.

RBAC · PII scrub · audit log

Contract-clause search across signed contracts

Semantic AI document search across a contract library so legal can find clause precedent without re-reading 400 MSAs. We ship this for legal-ops teams pre-redlining a new vendor agreement. Stack: pgvector or Pinecone, bge-reranker-v2, clause-level chunking, source-cited replies. See our published legal contract review RAG case study for the full pattern.

pgvector · bge-reranker-v2 Learn more

Codebase Q&A for engineering teams

RAG over your monorepo, internal libraries, architecture decision records, and runbooks. Engineers ask 'where do we handle Stripe refunds?' and the AI knowledge base answers with file path, function, and the ADR explaining the choice. Shipped as a Slack bot or a CLI; same retrieval surface either way.

GitHub · ADRs · Sonnet 4.6

Customer-facing self-serve KB (sibling to chatbot)

When the same retrieval surface needs to face external customers, the shape becomes a customer-service chatbot. Sub-2-second latency, public-only sources, no RBAC. We ship that through the sibling chatbot pillar: partners, not duplicates. Cross-link rather than re-pitch.

Web widget · WhatsApp · RAG Learn more
rag chatbot anatomy · over your internal docs

How a RAG knowledge base
actually answers a query.

Six stages every production AI knowledge base query moves through, from user question to logged outcome. Skip one and you ship the demo most vendors show instead of the ai-powered knowledge base that holds up at 10,000 documents. Each stage carries its own latency budget, model pick, and failure mode.

  1. 01RetrieveHybrid searchBM25 + dense over your corpus · top-k 20~50ms
  2. 02RerankCut to top-k 5bge-reranker-v2 · cross-encoder scoring~200ms
  3. 03GroundConstrain the modelSystem prompt: answer only from retrieved chunksfail-closed
  4. 04CiteSource per claimEvery answer span tied to a source doc + offsetaudit-ready
  5. 05AnswerCompose the replySonnet 4.6 · streamed · refusal when no grounding~600 out tokens
  6. 06LogEval + drift watchLangfuse · grounding rate · refusal rate · CTRevaled nightly

Latencies and token counts are typical production traces from shipped knowledge bases. Your eval set sets the real budgets.

ai agent assist · enterprise search ai · use cases by audience

Six audiences, six rollout shapes,
same RAG anatomy underneath.

The same retrieval anatomy ships across very different audiences. The eval set changes, the RBAC model changes, the rollout sequence changes. We pick the audience first in the audit, then back into the architecture that serves it.

Internal employee KB

The most common shape. Employees ask onboarding, HR policy, and IT-runbook questions in Slack; the knowledge base ai answers from your docs with a citation. Adoption signal we track at week 4: queries-per-active-user. Below 2 per week and the integration's wrong, not the model.

Notion · Slack · Sonnet 4.6

Agent-assist for support reps

Agent assist ai that drafts the reply for a tier-1 rep in their existing ticketing UI. Retrieves from past tickets and the help center; the rep accepts, edits, or rejects. Pairs with the customer-facing chatbot: the chatbot handles deflection, agent-assist handles the escalations.

Zendesk · Intercom · past tickets

Customer self-serve

Same retrieval surface, sub-2-second latency, public-only sources. We route this to the chatbot pillar. Different audience, different latency budget, different review process. Cross-link, don't duplicate.

Web widget · public docs Learn more

Codebase Q&A

Engineers ask the AI knowledge base over the monorepo and architecture decision records. Shipped as a Slack bot or a `gh kb ask` CLI. Same RAG anatomy; chunking is symbol-aware (functions, ADR sections) rather than fixed-size.

GitHub · ADRs · CLI

Contract-clause search for legal-ops

Enterprise search ai over a signed-contract library. Legal asks 'show me every indemnity clause capped at 12 months fees' and the AI document search returns the exact clauses with source MSAs. Shipped for legal-ops teams as a Confluence-embedded panel.

Contract library · clause search Learn more

Clinical knowledge with PHI scoping

Clinical AI knowledge base where retrieval respects patient-record boundaries and PHI never leaks across care teams. We shipped this pattern in the clinical-triage RAG agent. Same anatomy, harder review process.

PHI scope · audit log Learn more
glean alternative · notion ai alternative · build vs buy

Custom build, Glean, Guru, Notion AI, or DIY:
when each one is the right answer.

The best ai knowledge base software for your workflow may not be us. Sometimes the audit ends with us recommending Glean. Sometimes Notion AI. Sometimes an off-the-shelf ai knowledge management software bundle wins on time-to-value. The honest comparison below is per-dimension, not per-vendor. Run it against your stack before committing a budget on either side.

Dimension
You're here Custom build GetWidget builds it on your stack
Glean Enterprise-search SaaS box
Guru Card-based KB platform
Notion AI Notion's built-in AI layer
DIY LlamaIndex Your platform team ships it
Time to first production query How fast you get a real answer in front of real users.
Custom build 4–6 weeks after a 1–2 week audit. Eval-gated.
Glean 2–3 weeks if your data is already in their connectors.
Guru Hours if your knowledge is already in Guru cards.
Notion AI Same-day if 80%+ of your KB is in Notion.
DIY LlamaIndex 8–12 weeks if you don't already have retrieval infra.
RBAC + identity provider fit Does it match your existing access model, not a parallel one.
Custom build Mirrors your IdP at query time. Custom scopes supported.
Glean Strong on Okta/Azure AD; weaker on non-standard IdPs.
Guru Card-level visibility; less granular than IdP-driven.
Notion AI Inherits Notion permissions only. Cross-source RBAC is manual.
DIY LlamaIndex You own it, and you own the maintenance.
Source coverage outside the connector list Internal tool, custom DB, private contract store: does it ingest?
Custom build Anything with an API. We write the ingestion.
Glean Strong stock connectors; custom sources need their SDK.
Guru Cards-only. External sources require sync jobs.
Notion AI Notion-pages-only. No external corpus.
DIY LlamaIndex Full control. Full build cost.
Eval methodology + grounding-rate transparency Can you see, today, whether retrieval is working?
Custom build We ship Langfuse + a nightly eval suite. Numbers, not vibes.
Glean Internal eval not exposed to buyers.
Guru Card-hit metrics only; no grounding-rate.
Notion AI No eval surface buyers can see.
DIY LlamaIndex You build the eval; you read the eval.
Total 12-month cost (mid-market, ~500 seats) All-in: licence + run + integration.
Custom build $10–25K pilot + $5K/mo continuous + run cost. Predictable.
Glean ~$40–60/seat/year list + integration cost.
Guru ~$15/seat/mo + content-ops overhead.
Notion AI $10/seat/mo on top of Notion. Cheapest if you're already on Notion.
DIY LlamaIndex Headcount cost. Cheap if you already have the team.

Pricing benchmarks from public list prices + recent audit work. Your numbers vary; we re-benchmark on your eval before recommending.

model stack we ship

The three models behind an AI knowledge base,
picked per stage not per vendor.

A production knowledge base is not one model. It's a routed pipeline: cheap classify and query-rewrite at the front, grounded generate in the middle, cheap embedding at the back. Default stack below; we re-pick per workflow if your eval data demands it.

knowledge base playbook

How we ship a production AI knowledge base
in 4–6 weeks, flagged + evaled.

Four stages, milestone-billed, with a walk-away point at the retrieval baseline. Most knowledge base failures happen because the team skipped the eval set or skipped retrieval tuning. Both sit in week 1 and week 2 here, not bolted on at the end.

  1. Week 1

    Audit + eval set design

    We catalog your sources (Notion, Drive, Confluence, Zendesk, your DB), sample 100–200 real questions from Slack history or the ticket archive, and design the eval set the knowledge base will be measured against. RBAC model locked against your IdP here.

    Source catalog + eval set + 90-day roadmap
  2. Week 2

    Corpus build + retrieval baseline

    Ingest your docs, chunk with the right granularity (symbol-aware for code, clause-level for contracts, paragraph for prose), embed, index in pgvector or Pinecone. Score retrieval precision and recall@10 against the eval set. Most knowledge base quality issues are retrieval issues, found here.

    Retrieval baseline: precision · recall@10 · grounding rate
    Walk-away point
  3. Weeks 3–4

    Pilot build + RBAC + flag

    Wire the full anatomy: retrieve, rerank, ground, cite, answer, log. RBAC enforced at query time against your IdP. Behind a feature flag in your repo. Audit-log every query. UI shipped to one channel first (Slack bot or web portal); the other channels follow once the eval holds.

    Production AI knowledge base live behind a flag
  4. Weeks 5–6

    Rollout + token-optimisation pass

    Shadow mode for 1 week. Roll out at 10%, 50%, 100% if the grounding rate and refusal rate hold. Token-optimisation pass post-cutover: cheap-model classify in front, prompt cache on system + tool defs, top-k trim. Most knowledge bases land at 30–40% of naive baseline cost at the same eval quality.

    Full rollout + monthly cost target + drift-watch dashboard
▸ shipped this for

Production AI knowledge bases, on the public record. Read how they shipped.

The grounding-rate number is what got us past legal review. We'd shipped two internal-search projects before this and neither could tell us, on a Tuesday, whether retrieval was working. This one can.
— Head of Knowledge Engineering · enterprise SaaS, ~600 seats
engagement models

Three ways to start.
Audit, pilot, or continuous.

Same pricing as our other AI engagements. Most clients begin with the audit to scope sources and design the eval set, run a 4–6 week pilot on the highest-ROI audience, then move to monthly to ship the next 2–3 surfaces.

1–2 weeks

Knowledge base audit

Find the AI knowledge base shape worth shipping before you commit a budget.

$3K fixed
  • Source catalog across Notion · Drive · Confluence · Zendesk · your DB
  • 100–200-question eval set sampled from real Slack or ticket history
  • Model + RAG architecture pick with token-cost projection
  • RBAC model mapped to your existing identity provider
  • 90-day knowledge base roadmap with named workflows
Most teams start here
4–6 weeks

Knowledge base pilot

One AI knowledge base shipped end-to-end, with retrieval eval data, not a demo.

$10–25K fixed price
  • Corpus build + chunking + retrieval tuning against your real questions
  • Full anatomy: retrieve · rerank · ground · cite · answer · log
  • RBAC enforced at query time against your IdP
  • Shadow-mode metrics vs your existing search or human baseline
  • Token-optimisation pass post-cutover (route · cache · top-k trim)
  • Walk-away point: if retrieval precision won't move, no phase 2
Monthly

Continuous KB team

Embedded squad shipping new sources + tuning the live AI knowledge base.

from $5K per month
  • PM + KB engineer + ops analyst, embedded
  • Monthly grounding-rate, refusal-rate, and cost-of-ownership report
  • Eval drift + retrieval precision monitoring
  • New source integrations on cadence (Confluence, Salesforce, custom)
  • Cancel any month. No annual contract
Talk to us
Your repo, your data Claude + OpenAI + open-source RAG-first, eval-gated Model-agnostic, openly
honest scoping

When you should not hire us.

Three cases where the right answer is to buy something off the shelf or build it in-house. We say so in the audit before anyone signs a pilot.

  • Buy Glean if you want a no-engineering box and your data fits their connector list. Expect ~$40–60 per seat per year. Faster to live than a custom build if your IdP is Okta or Azure AD.
  • Buy Notion AI if your knowledge base is already in Notion. At $10 per seat per month on top of Notion, it's the cheapest answer when 80%+ of your wiki, runbooks, and HR docs already live there. A notion ai alternative isn't worth scoping until you've outgrown that surface.
  • DIY with LlamaIndex if you have a 10-person platform team and a one-week deadline. You already have engineers fluent in the stack we'd use; the audit-to-pilot cycle is slower than your in-house build.
frequently asked

Questions AI knowledge base buyers ask most.
Real answers, no hedging.

What's the difference between an AI knowledge base and a regular knowledge base?
A regular knowledge base is a doc store with full-text search: Confluence, SharePoint, Notion. An AI knowledge base adds a retrieval-augmented generation layer on top. The system retrieves the top 3–5 relevant chunks from your corpus, the reply model (Sonnet 4.6 in our default stack) composes a grounded answer, and every claim is cited to a source doc. The honest difference is failure mode. Regular KB fails by returning 40 irrelevant pages. AI knowledge base fails by either refusing (good, that's what we tune for) or by hallucinating (bad, which is what RBAC, grounding, and the eval suite exist to prevent). When teams ask 'do we need an ai knowledge base or just better search', the answer depends on whether your queries are noun-shaped ('show me the password-reset page') or sentence-shaped ('how do I onboard a contractor in Germany'). Sentence-shaped queries need RAG.
Should we buy Glean / Guru / Notion AI or build a custom AI knowledge base?
Honest answer: it depends on your data shape. Buy Glean if your data already lives in its connector list (Okta-style IdP, 50+ stock connectors) and you want a no-engineering box; expect ~$40–60/seat/year list pricing. Buy Notion AI if 80%+ of your knowledge already lives in Notion. It's the cheapest option, and a notion ai alternative isn't needed when you're already on Notion. Buy Guru if your knowledge is card-shaped and your support team already curates Guru cards. Build a custom ai-powered knowledge base with us when retrieval has to span a private corpus, RBAC must mirror a non-standard identity provider, or your eval target needs custom scoring. We say so in the $3K audit if you should buy. We've recommended Glean to two of the last twenty audit clients; neither needed a custom build. Looking for a glean alternative usually means the price is the friction, not the product; we'll tell you whether a custom build is cheaper across 3 years.
How does a RAG chatbot work over our internal docs?
Six stages, every query. Retrieve: hybrid search (BM25 + dense embeddings) over your corpus, top-k 20. Rerank: bge-reranker-v2 cuts to top-k 5 with cross-encoder scoring. Ground: the system prompt instructs the model to answer only from retrieved chunks or say it doesn't know. Cite: every answer span is tied to a source doc and offset. Answer: Sonnet 4.6 composes the reply, streamed. Log: Langfuse logs the query, retrieved chunks, citation accuracy, refusal rate, and grounding rate; we eval nightly against held-out questions. The same rag chatbot anatomy works whether the surface is a Slack bot, a web portal, or an embedded panel in Zendesk. Most knowledge base quality issues are retrieval issues, not generation issues, so we tune retrieval (chunking, top-k, reranker) before tuning prompts.
What does AI knowledge management cost?
Three engagement tiers, same pricing as our other AI services. A 1–2 week ai knowledge management audit is $3,000: source catalog, eval-set design, architecture and model pick, RBAC mapping, 90-day roadmap. A pilot is $10,000–$25,000 fixed price, 4–6 weeks: one production knowledge base shipped end-to-end with retrieval eval and RBAC. A continuous knowledge management ai engagement is from $5,000 per month: embedded PM, engineer, and ops analyst shipping new sources and tuning the live system. Run cost (model calls + vector DB + monitoring) lands at $300–$3,000 per month depending on query volume and corpus size. Ai for knowledge management is cheaper to build than most teams expect once retrieval is right. Most of the cost ends up in connector engineering, not model calls.
Can we use this for customer-facing support, not just internal?
Yes, but the shape changes enough that we usually ship it as a customer-service chatbot rather than a knowledge base. Same retrieval anatomy, different audience, different latency budget (sub-2-second instead of 5+), different review process (public-only sources, no RBAC). When clients ask about ai knowledge base software for customer self-serve, we route them to the chatbot pillar: partners, not duplicates. The KB pillar owns internal Q&A, agent-assist, and enterprise search; the chatbot pillar owns customer-facing single-turn. We've shipped both off the same RAG corpus more than once: internal version answers freely, customer-facing version is scoped to public docs only and gated behind a confidence threshold.
How long to ship a production AI knowledge base?
Most pilots ship in 4–6 weeks after a 1–2 week audit. Realistic distribution: simple knowledge bases (single source like Notion, 1,000–5,000 docs, English-only) in 3–4 weeks. Mid-complexity ai knowledge base software (3–5 sources, 10,000+ docs, RBAC against Okta or Azure AD) in 4–6 weeks. Complex (regulated industry with PHI or PII handling, multilingual across 5+ languages, custom IdP, 50,000+ docs) in 8–10 weeks. The audit phase tells us which bucket you're in before any pilot contract. We don't quote a 30-day knowledge base for work that takes 90 days. The walk-away point is week 2: if retrieval baseline won't hit 0.85+ recall@10 on your eval set, we stop and recommend either more corpus prep or a different approach.
How do you handle PII, RBAC, and access control?
Four layers. Identity-provider mirror: retrieval is scoped at query time against your existing IdP (Okta, Azure AD, Google Workspace, or custom). Users see only what they'd see in the source system; the AI knowledge base never escalates privilege. PII scrub at ingest and at query: we mask PII in the corpus before embedding and re-scrub at reply time, with allow-lists for fields the workflow legitimately needs. Audit log every query: who asked, what was retrieved, what was answered, what was cited. Stored in your warehouse, not ours. Refusal-by-default for compliance-sensitive sources: finance and legal corpora ship with a 0.95+ grounding-rate floor; sub-threshold queries refuse with a 'I don't have a grounded answer' message rather than guess. Regulated industries (healthcare, finance, legal) get a fifth layer: row-level encryption on the vector store and a separate audit-log retention policy.
When should we NOT hire an AI knowledge management agency?
Three cases. (1) Glean fits your connectors. If your data lives in their stock connectors, your IdP is Okta or Azure AD, and you want a no-engineering box, buy Glean. We'll tell you so in the audit. (2) Notion AI covers 80%+ of your knowledge. If your wiki, runbooks, and HR docs are already in Notion and adding cross-source retrieval isn't worth the engineering, stay with Notion AI at $10/seat/month. (3) You have a 10-person platform team and a one-week deadline. If you already have engineers fluent in LlamaIndex or LangChain, pgvector, and an eval framework, the build is faster in-house than the audit-to-pilot cycle. We'll point you at the open-source stack we'd use. The $3K audit exists partly to detect these three cases before anyone signs a pilot. If the answer is 'hire someone else' we'll say so, same as we do on chatbot, agent, and integration audits.
Ready to ship

Hire an AI knowledge management agency
that ships eval data, not demos.

Book a free AI knowledge base audit. We'll catalog your sources, sample 100+ real questions from your Slack or ticket archive, recommend the right shape (internal KB, agent-assist, enterprise search, or buy-Glean), pick models per stage, map RBAC to your IdP, and project token cost. No deck, no obligation to build.

Read case studies
30 min, async or live Token-cost projection included Eval-set design + RBAC mapping