A Quantum Symbiote Service · QTB-SVC-005

A chatbot trained on your real documents.
Every answer traceable
to the source.

Not a toy demo — a deployed assistant your users can hit from day one. Vector index built from your corpus, retrieval evaluated against a real question set, hosted with auth and rate limits. Provenance preserved on every answer.

Start a build → See the 6-stage build

Turnaround

10 days · 5-day rush available

LLM

Claude · OpenAI · open-source (your call)

Hosting

Your infrastructure or ours

Support

30–90 days, tier-dependent

The problem

An LLM that makes up answers is not a chatbot — it's a liability.

Most "AI chatbot" builds glue an LLM onto a website and call it done. The bot sounds confident. The bot is also making things up — half the time about topics where being wrong matters.

The fix isn't a smarter model. The fix is grounding every answer in a real document the bot can cite — and showing that citation to the user.

Retrieval-augmented generation is the fix. The bot searches your real corpus first, retrieves the relevant chunks, and the LLM writes its answer from that — not from training data the model half-remembers.

Every response ships with the source chunks attached. Users can verify. Your support team can audit. Hallucinations don't go to zero, but they drop hard.

The 6-stage build

From corpus to deployed assistant — in 10 days.

Two human gates. Four production stages. Retrieval gets evaluated against a real question set before anything goes live to users.

Stage 01

Discovery

Scoping call. What documents, what user questions, what hosting target. We agree on the test question set before building — so we measure what matters.

Stage 02

Ingest pipeline

Document ingestion built to re-run. Chunking strategy tuned to your content type. Provenance metadata preserved on every chunk.

Stage 03

Vector index + retrieval

Embedding model selected (Claude / OpenAI / open-source — your call). Vector store wired. Retrieval-and-generation chain with citation rendering.

Stage 04

UI + auth + deploy

Web UI or API endpoint depending on your scope. Auth + rate limits. Deployment to your infrastructure (or ours for Entry tier).

Stage 05

Evaluation pass

The test question set runs against the bot. Retrieval thresholds and re-rank logic tuned against actual failure modes — not hypothetical ones.

Stage 06

Handoff + support

You hit it with real questions. One revision pass. Maintenance docs, evaluation report, support window. Re-index when you add documents.

D1
Document ingestion pipelineRe-runnable when you add new sources. Chunking strategy documented per content type.
D2
Vector indexProvenance metadata on every chunk. Source attribution preserved through retrieval.
D3
RAG backendRetrieval-and-generation chain with citation rendering. Pluggable LLM endpoint.
D4
UI or APIWeb UI with conversation history, or API endpoint if you're embedding the bot elsewhere.
D5
Auth + rate limitsPer-user keys, daily caps, abuse protection. Reasonable defaults for your scale.
D6
Evaluation reportQuestion-by-question scoring on your test set. Where the bot is strong, where it drifts, what to monitor.
D7
Maintenance runbookHow to add new documents, re-index, version the corpus without losing the existing index.

What we measure

Retrieval quality is the bottleneck.

An LLM is only as good as what you feed it. Most "bad bot" complaints trace to retrieval, not generation — the model couldn't see the right chunk, so it improvised.

We evaluate retrieval against your real questions before tuning generation. Recall, rank, citation accuracy — measured per question, reported per build. The bot you ship is the bot we measured.

"Every answer ships with the source chunks it cited. Users can verify."

Pricing

Three tiers. Scale by corpus size and depth of eval.

Pick by document count, evaluation rigor, and hosting model. Up-scope at any time before kickoff.

Entry

Up to 100 documents (≤500 pages). Web UI or API. Hosted on shared infrastructure for 30 days.

$300

Fixed · 10-day delivery

Ingest pipeline + vector index
Single embedding model
Web UI or API endpoint
Basic auth + rate limits
30-day shared hosting
30-day support

Start an Entry build

Most common

Standard

Up to 1,000 documents. 20-question eval. Custom UI. Deployment to your infra.

$900

Fixed · 10-day delivery

Everything in Entry
20-question evaluation report
Retrieval tuning
Custom UI with your branding
Deploy to your infrastructure
60-day support

Start a Standard build

Premium

Up to 5,000 documents. Multi-modal ingest. Hybrid retrieval. 50-question eval. Observability.

$2,400

Fixed · 14-day delivery

Everything in Standard
Multi-modal ingest (PDF + images + tables)
Hybrid retrieval (semantic + keyword + re-ranker)
50-question evaluation report
Conversation history + feedback loop UI
Observability dashboard
90-day support + team training

Start a Premium build

Add-ons (any tier)

Slack / Teams / Discord integration+$300

Additional language (beyond English)+$200/lang

Streaming response UI w/ markdown+$200

Document upload portal for non-tech users+$400

Custom domain + SSL setup+$150

Re-ranker model (Cohere Rerank or equivalent)+$250

Rush delivery (5 days)+$400

Case · QS-CASE-005

Internal toolchain MCP server.
0 context-switch roundtrips.

3 → 0

Context-switch roundtrips per task

8m → 90s

Ticket triage time per ticket

"Spec-first" workflow adopted team-wide

The setup

A 4-person product team using Claude for code review, ticket triage, and project planning — but every Claude interaction required manually copy-pasting context from Notion (specs), GitHub Issues (tickets), and an internal wiki. Three context-switch roundtrips per task, every task.

We built a custom MCP server exposing all three internal systems as native Claude tools: notion_read, github_issues, wiki_search with semantic top-3 retrieval.

Same architectural primitive as a RAG chatbot — retrieval-grounded responses, citation-preserving, deployed to a team's actual workflow. Built on the same retrieval stack we'd ship for any customer-facing bot.

QTB-SVC-005 + QTB-SVC-006 · Client identifier withheld

Frequently asked

Common questions before kickoff.

Which LLM does the bot run on?

Default is Claude (Sonnet or Haiku depending on volume and budget). We'll wire OpenAI, Mistral, or open-source models on request — same architecture, different inference endpoint. We pick during discovery based on your latency, cost, and accuracy targets.

How do you handle documents that change frequently?

The ingest pipeline is re-runnable. For high-frequency updates we wire a webhook or scheduled re-index (add-on). For most clients, a manual re-index button in the maintenance UI is enough — usually triggered weekly or monthly.

What about hallucinations and wrong answers?

Retrieval-grounded responses cut the hallucination rate hard but don't zero it. Every answer ships with the source chunks it cited, so users can verify. The evaluation report tells you where the bot is most likely to drift, so you know what to monitor.

Can users see which document the answer came from?

Yes — citation rendering is built in on every tier. Click an answer, see the source chunk and the document it came from. This is non-negotiable; it's what makes the bot trustworthy.

What if the bot says something wrong in production?

Three layers of defense. Citation rendering lets users verify before trusting. The feedback loop UI (Premium) lets users flag bad answers. The observability dashboard (Premium) surfaces low-confidence answers and high-frequency questions so you know what to tune. You're not flying blind.

Who owns the deployed bot?

You do. The ingest pipeline, the vector index, the UI, the maintenance runbook — all yours, in your infrastructure (Standard and Premium). Entry tier sits on our shared hosting for 30 days then transfers. No vendor lock-in.

A chatbot trained on your real documents. Every answer traceable to the source.