ai knowledge base · production

AI knowledge base, shipped.
Not bought.

Production AI knowledge base and ai knowledge management for internal Q&A, agent-assist, and AI enterprise search. RAG-grounded retrieval across Notion, Confluence, Drive, Slack, Zendesk, and your code repo. Eval-first, RBAC-aware, audit-logged. First production knowledge base live in 4 to 6 weeks behind a feature flag, on Claude Sonnet 4.6 and GPT-5 mini routed per stage.

See the anatomy

Definition

What is an AI knowledge base?

An AI knowledge base is a retrieval-augmented search and answer system over proprietary content (docs, runbooks, ticket history, training material) that returns cited answers, not just a list of links. Unlike traditional keyword search (Algolia, Elasticsearch alone) which matches lexical tokens and ranks by relevance score, an AI knowledge base embeds documents into a vector index, performs hybrid retrieval (dense + sparse), reranks with a cross-encoder, and synthesises a cited answer with an LLM. Unlike a generic chatbot, an AI knowledge base regression-tests retrieval recall and answer faithfulness against a frozen eval set, requiring Ragas faithfulness scores at or above 0.8 before answering and refusing below that threshold. Common stacks combine pgvector or Pinecone with voyage-3-large embeddings, BAAI bge-reranker-large for reranking, Claude Sonnet 4.6 or GPT-5 for synthesis, and Langfuse for groundedness scoring.

10,000+

documents grounded on a typical pilot corpus

0.88+

recall@10 baseline before we ship to production

92%

grounding rate on the production eval set

200+

real questions in every eval set we build

ai knowledge base · what it actually means

Six AI knowledge base shapes we ship,
each with its own eval target.

The phrase ai knowledge base gets used for six different products. We ship all six. Each shape has a different retrieval surface, different eval target, and different rollout risk. Pick the shape your audience actually needs before scoping the pilot.

Internal-docs RAG: the employee AI knowledge base

The crown-jewel internal use case. RAG-grounded retrieval over policy docs, HR handbooks, onboarding guides, and engineering wikis. Employees ask in Slack or a web portal; the AI knowledge base answers from your docs with a citation, or refuses. RBAC scoped against your identity provider so finance can't see legal-only sources and vice versa.

AI agent assist over support tickets

Sits next to a human support agent in Zendesk or Intercom. Retrieves from past tickets, the help center, and product docs; drafts the reply the agent edits and sends. The agent assist AI pattern lifts handle time roughly 25–40% without removing the human from the loop. Eval set built from your last-90-day ticket archive.

Employee Q&A on policies and finance

AI knowledge management for the highest-risk surface: HR policy, finance procedures, legal precedent. RBAC scoped at query time. Refusal-by-default when retrieval misses. Audit log every query for compliance review. We ship this with a 0.95+ grounding rate or we don't ship it.

Contract-clause search across signed contracts

Semantic AI document search across a contract library so legal can find clause precedent without re-reading 400 MSAs. We ship this for legal-ops teams pre-redlining a new vendor agreement. Stack: pgvector or Pinecone, bge-reranker-v2, clause-level chunking, source-cited replies. See our published legal contract review RAG case study for the full pattern.

Codebase Q&A for engineering teams

RAG over your monorepo, internal libraries, architecture decision records, and runbooks. Engineers ask 'where do we handle Stripe refunds?' and the AI knowledge base answers with file path, function, and the ADR explaining the choice. Shipped as a Slack bot or a CLI; same retrieval surface either way.

Customer-facing self-serve KB (sibling to chatbot)

When the same retrieval surface needs to face external customers, the shape becomes a customer-service chatbot. Sub-2-second latency, public-only sources, no RBAC. We ship that through the sibling chatbot pillar: partners, not duplicates. Cross-link rather than re-pitch.

rag chatbot anatomy · over your internal docs

How a RAG knowledge base
actually answers a query.

Six stages every production AI knowledge base query moves through, from user question to logged outcome. Skip one and you ship the demo most vendors show instead of the ai-powered knowledge base that holds up at 10,000 documents. Each stage carries its own latency budget, model pick, and failure mode.

01RetrieveHybrid searchBM25 + dense over your corpus · top-k 20~50ms
02RerankCut to top-k 5bge-reranker-v2 · cross-encoder scoring~200ms
03GroundConstrain the modelSystem prompt: answer only from retrieved chunksfail-closed
04CiteSource per claimEvery answer span tied to a source doc + offsetaudit-ready
05AnswerCompose the replySonnet 4.6 · streamed · refusal when no grounding~600 out tokens
06LogEval + drift watchLangfuse · grounding rate · refusal rate · CTRevaled nightly

Latencies and token counts are typical production traces from shipped knowledge bases. Your eval set sets the real budgets.

ai agent assist · enterprise search ai · use cases by audience

Six audiences, six rollout shapes,
same RAG anatomy underneath.

The same retrieval anatomy ships across very different audiences. The eval set changes, the RBAC model changes, the rollout sequence changes. We pick the audience first in the audit, then back into the architecture that serves it.

Internal employee KB

The most common shape. Employees ask onboarding, HR policy, and IT-runbook questions in Slack; the knowledge base ai answers from your docs with a citation. Adoption signal we track at week 4: queries-per-active-user. Below 2 per week and the integration's wrong, not the model.

Agent-assist for support reps

Agent assist ai that drafts the reply for a tier-1 rep in their existing ticketing UI. Retrieves from past tickets and the help center; the rep accepts, edits, or rejects. Pairs with the customer-facing chatbot: the chatbot handles deflection, agent-assist handles the escalations.

Customer self-serve

Same retrieval surface, sub-2-second latency, public-only sources. We route this to the chatbot pillar. Different audience, different latency budget, different review process. Cross-link, don't duplicate.

Codebase Q&A

Engineers ask the AI knowledge base over the monorepo and architecture decision records. Shipped as a Slack bot or a `gh kb ask` CLI. Same RAG anatomy; chunking is symbol-aware (functions, ADR sections) rather than fixed-size.

Contract-clause search for legal-ops

Enterprise search ai over a signed-contract library. Legal asks 'show me every indemnity clause capped at 12 months fees' and the AI document search returns the exact clauses with source MSAs. Shipped for legal-ops teams as a Confluence-embedded panel.

Clinical knowledge with PHI scoping

Clinical AI knowledge base where retrieval respects patient-record boundaries and PHI never leaks across care teams. We shipped this pattern in the clinical-triage RAG agent. Same anatomy, harder review process.

glean alternative · notion ai alternative · build vs buy

Custom build, Glean, Guru, Notion AI, or DIY:
when each one is the right answer.

The best ai knowledge base software for your workflow may not be us. Sometimes the audit ends with us recommending Glean. Sometimes Notion AI. Sometimes an off-the-shelf ai knowledge management software bundle wins on time-to-value. The honest comparison below is per-dimension, not per-vendor. Run it against your stack before committing a budget on either side.

Dimension

You're here Custom build GetWidget builds it on your stack

Glean Enterprise-search SaaS box

Guru Card-based KB platform

Notion AI Notion's built-in AI layer

DIY LlamaIndex Your platform team ships it

Time to first production query How fast you get a real answer in front of real users.

Custom build 4–6 weeks after a 1–2 week audit. Eval-gated.

Glean 2–3 weeks if your data is already in their connectors.

Guru Hours if your knowledge is already in Guru cards.

Notion AI Same-day if 80%+ of your KB is in Notion.

DIY LlamaIndex 8–12 weeks if you don't already have retrieval infra.

RBAC + identity provider fit Does it match your existing access model, not a parallel one.

Custom build Mirrors your IdP at query time. Custom scopes supported.

Glean Strong on Okta/Azure AD; weaker on non-standard IdPs.

Guru Card-level visibility; less granular than IdP-driven.

Notion AI Inherits Notion permissions only. Cross-source RBAC is manual.

DIY LlamaIndex You own it, and you own the maintenance.

Source coverage outside the connector list Internal tool, custom DB, private contract store: does it ingest?

Custom build Anything with an API. We write the ingestion.

Glean Strong stock connectors; custom sources need their SDK.

Guru Cards-only. External sources require sync jobs.

Notion AI Notion-pages-only. No external corpus.

DIY LlamaIndex Full control. Full build cost.

Eval methodology + grounding-rate transparency Can you see, today, whether retrieval is working?

Custom build We ship Langfuse + a nightly eval suite. Numbers, not vibes.

Glean Internal eval not exposed to buyers.

Guru Card-hit metrics only; no grounding-rate.

Notion AI No eval surface buyers can see.

DIY LlamaIndex You build the eval; you read the eval.

Total 12-month cost (mid-market, ~500 seats) All-in: licence + run + integration.

Custom build fixed-bid pilot + monthly continuous + run cost. Predictable.

Glean ~$40–60/seat/year list + integration cost.

Guru ~$15/seat/mo + content-ops overhead.

Notion AI $10/seat/mo on top of Notion. Cheapest if you're already on Notion.

DIY LlamaIndex Headcount cost. Cheap if you already have the team.

Pricing benchmarks from public list prices + recent audit work. Your numbers vary; we re-benchmark on your eval before recommending.

model stack we ship

The three models behind an AI knowledge base,
picked per stage not per vendor.

A production knowledge base is not one model. It's a routed pipeline: cheap classify and query-rewrite at the front, grounded generate in the middle, cheap embedding at the back. Default stack below; we re-pick per workflow if your eval data demands it.

Default

Claude Sonnet 4.6

Anthropic

200K context $3 / M in · $15 / M out

Grounded reasoning · default reply model · long-context RAG

GPT-5 mini

OpenAI

128K context $0.15 / M in · $0.60 / M out

Cheap classify · query rewrite · cheap RAG drafts

bge-large / OAI text-embed-3-large

Embeddings

1024-d / 3072-d Self-hosted or $0.13 / M tokens

Dense retrieval · multilingual coverage · re-rank input

knowledge base playbook

How we ship a production AI knowledge base
in 4–6 weeks, flagged + evaled.

Four stages, milestone-billed, with a walk-away point at the retrieval baseline. Most knowledge base failures happen because the team skipped the eval set or skipped retrieval tuning. Both sit in week 1 and week 2 here, not bolted on at the end.

Week 1

Audit + eval set design

We catalog your sources (Notion, Drive, Confluence, Zendesk, your DB), sample 100–200 real questions from Slack history or the ticket archive, and design the eval set the knowledge base will be measured against. RBAC model locked against your IdP here.

Source catalog + eval set + 90-day roadmap
Week 2

Corpus build + retrieval baseline

Ingest your docs, chunk with the right granularity (symbol-aware for code, clause-level for contracts, paragraph for prose), embed, index in pgvector or Pinecone. Score retrieval precision and recall@10 against the eval set. Most knowledge base quality issues are retrieval issues, found here.

Retrieval baseline: precision · recall@10 · grounding rate

Walk-away point
Weeks 3–4

Pilot build + RBAC + flag

Wire the full anatomy: retrieve, rerank, ground, cite, answer, log. RBAC enforced at query time against your IdP. Behind a feature flag in your repo. Audit-log every query. UI shipped to one channel first (Slack bot or web portal); the other channels follow once the eval holds.

Production AI knowledge base live behind a flag
Weeks 5–6

Rollout + token-optimisation pass

Shadow mode for 1 week. Roll out at 10%, 50%, 100% if the grounding rate and refusal rate hold. Token-optimisation pass post-cutover: cheap-model classify in front, prompt cache on system + tool defs, top-k trim. Most knowledge bases land at 30–40% of naive baseline cost at the same eval quality.

Full rollout + monthly cost target + drift-watch dashboard

▸ shipped this for

Production AI knowledge bases, on the public record. Read how they shipped.

The grounding-rate number is what got us past legal review. We'd shipped two internal-search projects before this and neither could tell us, on a Tuesday, whether retrieval was working. This one can.

— Head of Knowledge Engineering · enterprise SaaS, ~600 seats

10K+ documents grounded · 0.92 groundedness on eval (devtools RAG, 2026-Q1)

DEVTOOLS · Anthropic

engagement models

Three ways to start.
Audit, pilot, or continuous.

Same pricing as our other AI engagements. Most clients begin with the audit to scope sources and design the eval set, run a 4–6 week pilot on the highest-ROI audience, then move to monthly to ship the next 2–3 surfaces.

1–2 weeks

Knowledge base audit

Find the AI knowledge base shape worth shipping before you commit a budget.

Fixed-fee fixed

Source catalog across Notion · Drive · Confluence · Zendesk · your DB
100–200-question eval set sampled from real Slack or ticket history
Model + RAG architecture pick with token-cost projection
RBAC model mapped to your existing identity provider
90-day knowledge base roadmap with named workflows

Most teams start here

4–6 weeks

Knowledge base pilot

One AI knowledge base shipped end-to-end, with retrieval eval data, not a demo.

Fixed-bid fixed price

Corpus build + chunking + retrieval tuning against your real questions
Full anatomy: retrieve · rerank · ground · cite · answer · log
RBAC enforced at query time against your IdP
Shadow-mode metrics vs your existing search or human baseline
Token-optimisation pass post-cutover (route · cache · top-k trim)
Walk-away point: if retrieval precision won't move, no phase 2

Monthly

Continuous KB team

Embedded squad shipping new sources + tuning the live AI knowledge base.

monthly per month

PM + KB engineer + ops analyst, embedded
Monthly grounding-rate, refusal-rate, and cost-of-ownership report
Eval drift + retrieval precision monitoring
New source integrations on cadence (Confluence, Salesforce, custom)
Cancel any month. No annual contract

Talk to us

Your repo, your data Claude + OpenAI + open-source RAG-first, eval-gated Model-agnostic, openly

honest scoping

When you should not hire us.

Three cases where the right answer is to buy something off the shelf or build it in-house. We say so in the audit before anyone signs a pilot.

Buy Glean if you want a no-engineering box and your data fits their connector list. Expect ~$40–60 per seat per year. Faster to live than a custom build if your IdP is Okta or Azure AD.
Buy Notion AI if your knowledge base is already in Notion. At $10 per seat per month on top of Notion, it's the cheapest answer when 80%+ of your wiki, runbooks, and HR docs already live there. A notion ai alternative isn't worth scoping until you've outgrown that surface.
DIY with LlamaIndex if you have a 10-person platform team and a one-week deadline. You already have engineers fluent in the stack we'd use; the audit-to-pilot cycle is slower than your in-house build.

frequently asked

Questions AI knowledge base buyers ask most.
Real answers, no hedging.

What's the difference between an AI knowledge base and a regular knowledge base?

A regular knowledge base is a doc store with full-text search: Confluence, SharePoint, Notion. An AI knowledge base adds a retrieval-augmented generation layer on top. The system retrieves the top 3–5 relevant chunks from your corpus, the reply model (Sonnet 4.6 in our default stack) composes a grounded answer, and every claim is cited to a source doc. The honest difference is failure mode. Regular KB fails by returning 40 irrelevant pages. AI knowledge base fails by either refusing (good, that's what we tune for) or by hallucinating (bad, which is what RBAC, grounding, and the eval suite exist to prevent). When teams ask 'do we need an ai knowledge base or just better search', the answer depends on whether your queries are noun-shaped ('show me the password-reset page') or sentence-shaped ('how do I onboard a contractor in Germany'). Sentence-shaped queries need RAG.

Should we buy Glean / Guru / Notion AI or build a custom AI knowledge base?

Honest answer: it depends on your data shape. Buy Glean if your data already lives in its connector list (Okta-style IdP, 50+ stock connectors) and you want a no-engineering box; expect ~$40–60/seat/year list pricing. Buy Notion AI if 80%+ of your knowledge already lives in Notion. It's the cheapest option, and a notion ai alternative isn't needed when you're already on Notion. Buy Guru if your knowledge is card-shaped and your support team already curates Guru cards. Build a custom ai-powered knowledge base with us when retrieval has to span a private corpus, RBAC must mirror a non-standard identity provider, or your eval target needs custom scoring. We say so in the discovery audit if you should buy. We've recommended Glean to two of the last twenty audit clients; neither needed a custom build. Looking for a glean alternative usually means the price is the friction, not the product; we'll tell you whether a custom build is cheaper across 3 years.

How does a RAG chatbot work over our internal docs?

Six stages, every query. Retrieve: hybrid search (BM25 + dense embeddings) over your corpus, top-k 20. Rerank: bge-reranker-v2 cuts to top-k 5 with cross-encoder scoring. Ground: the system prompt instructs the model to answer only from retrieved chunks or say it doesn't know. Cite: every answer span is tied to a source doc and offset. Answer: Sonnet 4.6 composes the reply, streamed. Log: Langfuse logs the query, retrieved chunks, citation accuracy, refusal rate, and grounding rate; we eval nightly against held-out questions. The same rag chatbot anatomy works whether the surface is a Slack bot, a web portal, or an embedded panel in Zendesk. Most knowledge base quality issues are retrieval issues, not generation issues, so we tune retrieval (chunking, top-k, reranker) before tuning prompts.

What does AI knowledge management cost?

Three engagement tiers, same pricing as our other AI services. A 1–2 week ai knowledge management audit is fixed-fee: source catalog, eval-set design, architecture and model pick, RBAC mapping, 90-day roadmap. A pilot is fixed-bid, 4–6 weeks: one production knowledge base shipped end-to-end with retrieval eval and RBAC. A continuous knowledge management ai engagement is monthly: embedded PM, engineer, and ops analyst shipping new sources and tuning the live system. Run cost (model calls + vector DB + monitoring) lands at $300–fixed-fee per month depending on query volume and corpus size. Ai for knowledge management is cheaper to build than most teams expect once retrieval is right. Most of the cost ends up in connector engineering, not model calls.

Can we use this for customer-facing support, not just internal?

Yes, but the shape changes enough that we usually ship it as a customer-service chatbot rather than a knowledge base. Same retrieval anatomy, different audience, different latency budget (sub-2-second instead of 5+), different review process (public-only sources, no RBAC). When clients ask about ai knowledge base software for customer self-serve, we route them to the chatbot pillar: partners, not duplicates. The KB pillar owns internal Q&A, agent-assist, and enterprise search; the chatbot pillar owns customer-facing single-turn. We've shipped both off the same RAG corpus more than once: internal version answers freely, customer-facing version is scoped to public docs only and gated behind a confidence threshold.

How long to ship a production AI knowledge base?

Most pilots ship in 4–6 weeks after a 1–2 week audit. Realistic distribution: simple knowledge bases (single source like Notion, 1,000–5,000 docs, English-only) in 3–4 weeks. Mid-complexity ai knowledge base software (3–5 sources, 10,000+ docs, RBAC against Okta or Azure AD) in 4–6 weeks. Complex (regulated industry with PHI or PII handling, multilingual across 5+ languages, custom IdP, 50,000+ docs) in 8–10 weeks. The audit phase tells us which bucket you're in before any pilot contract. We don't quote a 30-day knowledge base for work that takes 90 days. The walk-away point is week 2: if retrieval baseline won't hit 0.85+ recall@10 on your eval set, we stop and recommend either more corpus prep or a different approach.

How do you handle PII, RBAC, and access control?

Four layers. Identity-provider mirror: retrieval is scoped at query time against your existing IdP (Okta, Azure AD, Google Workspace, or custom). Users see only what they'd see in the source system; the AI knowledge base never escalates privilege. PII scrub at ingest and at query: we mask PII in the corpus before embedding and re-scrub at reply time, with allow-lists for fields the workflow legitimately needs. Audit log every query: who asked, what was retrieved, what was answered, what was cited. Stored in your warehouse, not ours. Refusal-by-default for compliance-sensitive sources: finance and legal corpora ship with a 0.95+ grounding-rate floor; sub-threshold queries refuse with a 'I don't have a grounded answer' message rather than guess. Regulated industries (healthcare, finance, legal) get a fifth layer: row-level encryption on the vector store and a separate audit-log retention policy.

When should we NOT hire an AI knowledge management agency?

Three cases. (1) Glean fits your connectors. If your data lives in their stock connectors, your IdP is Okta or Azure AD, and you want a no-engineering box, buy Glean. We'll tell you so in the audit. (2) Notion AI covers 80%+ of your knowledge. If your wiki, runbooks, and HR docs are already in Notion and adding cross-source retrieval isn't worth the engineering, stay with Notion AI at $10/seat/month. (3) You have a 10-person platform team and a one-week deadline. If you already have engineers fluent in LlamaIndex or LangChain, pgvector, and an eval framework, the build is faster in-house than the audit-to-pilot cycle. We'll point you at the open-source stack we'd use. The discovery audit exists partly to detect these three cases before anyone signs a pilot. If the answer is 'hire someone else' we'll say so, same as we do on chatbot, agent, and integration audits.

Ready to ship

Hire an AI knowledge management agency
that ships eval data, not demos.

Book a free AI knowledge base audit. We'll catalog your sources, sample 100+ real questions from your Slack or ticket archive, recommend the right shape (internal KB, agent-assist, enterprise search, or buy-Glean), pick models per stage, map RBAC to your IdP, and project token cost. No deck, no obligation to build.

Read case studies

30 min, async or live Token-cost projection included Eval-set design + RBAC mapping

keep exploring

Related pages.
Pick where you are.

A knowledge base often connects to a sibling AI service. These pages go deeper on the adjacent decisions.

01 Service

AI knowledge base, shipped. Not bought.

What is an AI knowledge base?

Six AI knowledge base shapes we ship, each with its own eval target.

Internal-docs RAG: the employee AI knowledge base

AI agent assist over support tickets

Employee Q&A on policies and finance

Contract-clause search across signed contracts

Codebase Q&A for engineering teams

Customer-facing self-serve KB (sibling to chatbot)

How a RAG knowledge base actually answers a query.

Six audiences, six rollout shapes, same RAG anatomy underneath.

Internal employee KB

Agent-assist for support reps

Customer self-serve

Codebase Q&A

Contract-clause search for legal-ops

Clinical knowledge with PHI scoping

Custom build, Glean, Guru, Notion AI, or DIY: when each one is the right answer.

The three models behind an AI knowledge base, picked per stage not per vendor.

How we ship a production AI knowledge base in 4–6 weeks, flagged + evaled.

Audit + eval set design

Corpus build + retrieval baseline

Pilot build + RBAC + flag

Rollout + token-optimisation pass

Claude RAG over product docs

Legal contract review RAG

Clinical triage RAG agent

Three ways to start. Audit, pilot, or continuous.

Knowledge base audit

Knowledge base pilot

Continuous KB team

When you should not hire us.

Questions AI knowledge base buyers ask most. Real answers, no hedging.

Hire an AI knowledge management agency that ships eval data, not demos.

Related pages. Pick where you are.

AI Chatbot Development

AI Agent Development

AI Development

Intelligent Document Processing

AI Consulting

Legal AI

Healthcare AI

Education AI

RAG vs fine-tuning decision tool

RAG benchmark (2026-Q2)

AI engineering hub