QS RAG
Start a chatbot build →
A Quantum Symbiote Service · QTB-SVC-005

A chatbot trained on your real documents.
Every answer traceable
to the source.

Not a toy demo — a deployed assistant your users can hit from day one. Vector index built from your corpus, retrieval evaluated against a real question set, hosted with auth and rate limits. Provenance preserved on every answer.

Turnaround
10 days · 5-day rush available
LLM
Claude · OpenAI · open-source (your call)
Hosting
Your infrastructure or ours
Support
30–90 days, tier-dependent
5,000+
Documents handled at Premium tier · multi-modal ingest
50q
Question test set evaluated before ship — Premium
100%
Of answers ship with cited source chunks — every tier
0
Hidden subscriptions or per-query vendor markups
The problem

An LLM that makes up answers is not a chatbot — it's a liability.

Most "AI chatbot" builds glue an LLM onto a website and call it done. The bot sounds confident. The bot is also making things up — half the time about topics where being wrong matters.

The fix isn't a smarter model. The fix is grounding every answer in a real document the bot can cite — and showing that citation to the user.

Retrieval-augmented generation is the fix. The bot searches your real corpus first, retrieves the relevant chunks, and the LLM writes its answer from that — not from training data the model half-remembers.

Every response ships with the source chunks attached. Users can verify. Your support team can audit. Hallucinations don't go to zero, but they drop hard.

The 6-stage build

From corpus to deployed assistant — in 10 days.

Two human gates. Four production stages. Retrieval gets evaluated against a real question set before anything goes live to users.

Stage 01

Discovery

Scoping call. What documents, what user questions, what hosting target. We agree on the test question set before building — so we measure what matters.

Stage 02

Ingest pipeline

Document ingestion built to re-run. Chunking strategy tuned to your content type. Provenance metadata preserved on every chunk.

Stage 03

Vector index + retrieval

Embedding model selected (Claude / OpenAI / open-source — your call). Vector store wired. Retrieval-and-generation chain with citation rendering.

Stage 04

UI + auth + deploy

Web UI or API endpoint depending on your scope. Auth + rate limits. Deployment to your infrastructure (or ours for Entry tier).

Stage 05

Evaluation pass

The test question set runs against the bot. Retrieval thresholds and re-rank logic tuned against actual failure modes — not hypothetical ones.

Stage 06

Handoff + support

You hit it with real questions. One revision pass. Maintenance docs, evaluation report, support window. Re-index when you add documents.

  • D1
    Document ingestion pipelineRe-runnable when you add new sources. Chunking strategy documented per content type.
  • D2
    Vector indexProvenance metadata on every chunk. Source attribution preserved through retrieval.
  • D3
    RAG backendRetrieval-and-generation chain with citation rendering. Pluggable LLM endpoint.
  • D4
    UI or APIWeb UI with conversation history, or API endpoint if you're embedding the bot elsewhere.
  • D5
    Auth + rate limitsPer-user keys, daily caps, abuse protection. Reasonable defaults for your scale.
  • D6
    Evaluation reportQuestion-by-question scoring on your test set. Where the bot is strong, where it drifts, what to monitor.
  • D7
    Maintenance runbookHow to add new documents, re-index, version the corpus without losing the existing index.
What we measure

Retrieval quality is the bottleneck.

An LLM is only as good as what you feed it. Most "bad bot" complaints trace to retrieval, not generation — the model couldn't see the right chunk, so it improvised.

We evaluate retrieval against your real questions before tuning generation. Recall, rank, citation accuracy — measured per question, reported per build. The bot you ship is the bot we measured.

"Every answer ships with the source chunks it cited. Users can verify."
Pricing

Three tiers. Scale by corpus size and depth of eval.

Pick by document count, evaluation rigor, and hosting model. Up-scope at any time before kickoff.

Entry
Up to 100 documents (≤500 pages). Web UI or API. Hosted on shared infrastructure for 30 days.
$300
Fixed · 10-day delivery
  • Ingest pipeline + vector index
  • Single embedding model
  • Web UI or API endpoint
  • Basic auth + rate limits
  • 30-day shared hosting
  • 30-day support
Start an Entry build
Premium
Up to 5,000 documents. Multi-modal ingest. Hybrid retrieval. 50-question eval. Observability.
$2,400
Fixed · 14-day delivery
  • Everything in Standard
  • Multi-modal ingest (PDF + images + tables)
  • Hybrid retrieval (semantic + keyword + re-ranker)
  • 50-question evaluation report
  • Conversation history + feedback loop UI
  • Observability dashboard
  • 90-day support + team training
Start a Premium build
Add-ons (any tier)
Slack / Teams / Discord integration+$300
Additional language (beyond English)+$200/lang
Streaming response UI w/ markdown+$200
Document upload portal for non-tech users+$400
Custom domain + SSL setup+$150
Re-ranker model (Cohere Rerank or equivalent)+$250
Rush delivery (5 days)+$400
Case · QS-CASE-005

Internal toolchain MCP server.
0 context-switch roundtrips.

3 → 0
Context-switch roundtrips per task
8m → 90s
Ticket triage time per ticket
1
"Spec-first" workflow adopted team-wide
The setup

A 4-person product team using Claude for code review, ticket triage, and project planning — but every Claude interaction required manually copy-pasting context from Notion (specs), GitHub Issues (tickets), and an internal wiki. Three context-switch roundtrips per task, every task.

We built a custom MCP server exposing all three internal systems as native Claude tools: notion_read, github_issues, wiki_search with semantic top-3 retrieval.

Same architectural primitive as a RAG chatbot — retrieval-grounded responses, citation-preserving, deployed to a team's actual workflow. Built on the same retrieval stack we'd ship for any customer-facing bot.

QTB-SVC-005 + QTB-SVC-006 · Client identifier withheld
Start a build

Tell us about the corpus. We'll measure retrieval feasibility before kickoff.

Submit the form. Scoping call within 24 hours. We'll review the corpus type, target users, and test questions before scoping the tier.

Submitting routes to [email protected]. Scoping call within 24 hours.
Frequently asked

Common questions before kickoff.

Which LLM does the bot run on?
Default is Claude (Sonnet or Haiku depending on volume and budget). We'll wire OpenAI, Mistral, or open-source models on request — same architecture, different inference endpoint. We pick during discovery based on your latency, cost, and accuracy targets.
How do you handle documents that change frequently?
The ingest pipeline is re-runnable. For high-frequency updates we wire a webhook or scheduled re-index (add-on). For most clients, a manual re-index button in the maintenance UI is enough — usually triggered weekly or monthly.
What about hallucinations and wrong answers?
Retrieval-grounded responses cut the hallucination rate hard but don't zero it. Every answer ships with the source chunks it cited, so users can verify. The evaluation report tells you where the bot is most likely to drift, so you know what to monitor.
Can users see which document the answer came from?
Yes — citation rendering is built in on every tier. Click an answer, see the source chunk and the document it came from. This is non-negotiable; it's what makes the bot trustworthy.
What if the bot says something wrong in production?
Three layers of defense. Citation rendering lets users verify before trusting. The feedback loop UI (Premium) lets users flag bad answers. The observability dashboard (Premium) surfaces low-confidence answers and high-frequency questions so you know what to tune. You're not flying blind.
Who owns the deployed bot?
You do. The ingest pipeline, the vector index, the UI, the maintenance runbook — all yours, in your infrastructure (Standard and Premium). Entry tier sits on our shared hosting for 30 days then transfers. No vendor lock-in.