Solution Architecture & Technical Specification · July 2026

Grounded RAG Chatbot — Wise Compass & My Patient Advice

The system answers only from the two WordPress knowledge bases, links every answer to its source articles, reproduces Quran/Hadith byte-for-byte, and carries a server-enforced medical disclaimer on the health tenant. One Postgres, one API, two isolated tenant schemas — every hard requirement enforced structurally, not by prompt.

Corpus 28,717 published (8,740 WC · 19,977 MPA) Vectors ~59,400 · halfvec 1024 Scripture articles 7,898 Latency target p95 < 4s Residency EU (Frankfurt)
01

Architecture

Five components: a WordPress bridge, a sync worker, one Postgres with pgvector, a Fastify API that runs the retrieve → gate → generate → verify pipeline, and an embeddable chat widget. WordPress remains the source of truth; the RAG store holds embeddings, metadata, and verbatim scripture only.

WordPress (MySQL) wisecompass.com mypatientadvice.com kb/v1/articles?modified_after save_post webhook (HMAC) Sync Worker bulk load → 15-min poll SHA-256 re-embed gate nightly reconciliation clean · chunk · extract scripture Postgres 16 + pgvector schema kb_wc · schema kb_mpa article · chunk(halfvec 1024) scripture(sha256) · ingest_run HNSW m=16 · tsvector GIN RLS backstop · PgBouncer Fastify API — /v1 (EU) 1 · hybrid retrieve (RRF) + rerank 2 · answerability gate → "not covered" 3 · Claude Sonnet 5 (cached prompt) 4 · scripture substitute + hash verify 5 · output rails + disclaimer (MPA) rate limits · spend breaker · Turnstile Chat Widget (WP embed) fetch() + ReadableStream SSE · Cloudflare Safety Sidecars — Haiku 4.5 emergency / crisis / dosage classifier MPA sentence-entailment judge number+unit byte-match rule Voyage-4 embeddings (1024-dim) at ingest + query · Upstash Redis response cache · Langfuse + Sentry observability

Fig. 1 — System architecture. Scripture and disclaimers are enforced inside the API pipeline (stages 4–5), after the model and before the client.

Component summary

ComponentTechnologyResponsibility
WP bridgeSmall additions to the existing knowledgebase plugin (PHP)Delta endpoint (modified_after, paginated) + HMAC-signed save_post / trashed_post webhook. Read-only; WP stays the source of truth.
Sync workerTypeScript (Node 22), DockerBulk load, 15-min incremental poll with content-hash gating, nightly reconciliation. Cleaning, chunking, scripture extraction, embedding calls.
StoragePostgres 16 + pgvector (halfvec), PgBouncerTwo isolated tenant schemas, each with its own HNSW + GIN indexes. RLS as defense in depth. Holds vectors, metadata, verbatim scripture — not full mirrored content beyond cleaned text.
APIFastify + zod (fastify-type-provider-zod), OpenAPI 3.1, Anthropic TS SDKThe 5-stage answer pipeline; typed-event SSE; abuse stack; per-tenant config (system prompts, disclaimers, thresholds).
WidgetVanilla JS, ~15KB, tolerant readerEmbeds in both WP sites; fetch()+ReadableStream SSE; renders segments, citations, disclaimer; suggested-questions panel.
SidecarsHaiku 4.5 (API), bge-reranker-v2-m3 (self-host)Safety classification, MPA groundedness judging, cross-encoder reranking.

Design principles

02

Data & Ingestion

Three ingestion modes share one cleaning-and-chunking core. Re-embedding is gated on a SHA-256 of the cleaned text, because WordPress bumps modified on metadata-only saves.

BULK (once) POLL (15 min) NIGHTLY JSON exports clean + filterpublish-only · shortcodes · junk block-aware chunkref-para + blockquote atomic extract scripturequote_html + sha256 embed + insertassert ≈59.4k chunks GET modified_after=watermark sha256 of cleantext changed? delete + reinsert chunksone transaction · re-embed changed only skip (metadata-only save) advance watermarklog ingest_run yesno full post_id set diffWP vs index cascade-delete removed articlescatches hard deletes + trash alert on drift > thresholdsync-status dashboard Optional save_post webhook lowers freshness latency to seconds; polling remains the source of truth. Live edits reflected in the index in < 20 minutes worst-case.

Fig. 2 — Ingestion: bulk load, 15-minute incremental poll with hash gating, nightly reconciliation.

Cleaning & chunking rules

Postgres schema (applied to kb_wc and kb_mpa)

pg-sql
CREATE TABLE article (
  post_id         bigint PRIMARY KEY,
  title           text NOT NULL,
  slug            text NOT NULL,
  url             text NOT NULL,
  status          text NOT NULL CHECK (status = 'publish'),
  published_at    timestamptz,
  modified_at     timestamptz NOT NULL,
  author_id       bigint,
  reviewer_ids    bigint[] DEFAULT '{}',     -- MPA clinician credentials
  category_ids    int[]    DEFAULT '{}',
  clean_text      text NOT NULL,
  word_count      int,
  content_sha256  bytea NOT NULL,             -- re-embed gate
  last_indexed_at timestamptz
);
CREATE INDEX ON article USING btree (modified_at);
CREATE INDEX ON article USING gin (category_ids);

CREATE TABLE chunk (
  id           bigserial PRIMARY KEY,
  post_id      bigint NOT NULL REFERENCES article ON DELETE CASCADE,
  seq          int    NOT NULL,
  heading_path text,
  body         text   NOT NULL,
  token_count  int    NOT NULL,
  embedding    halfvec(1024) NOT NULL,        -- voyage-4
  tsv          tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED,
  UNIQUE (post_id, seq)
);
CREATE INDEX ON chunk USING hnsw (embedding halfvec_cosine_ops) WITH (m=16, ef_construction=64);
CREATE INDEX ON chunk USING gin (tsv);

CREATE TABLE scripture (
  id         bigserial PRIMARY KEY,
  post_id    bigint NOT NULL REFERENCES article ON DELETE CASCADE,
  chunk_id   bigint REFERENCES chunk,
  seq        int,
  ref_label  text,                             -- "Surah Al Tawbah (9), Verse 128" — from the PRECEDING paragraph
  quote_html text NOT NULL,                   -- byte-for-byte wp-block-quote HTML
  sha256     bytea NOT NULL
);

CREATE TABLE ingest_run (
  id bigserial PRIMARY KEY, kind text NOT NULL,  -- 'bulk' | 'poll' | 'reconcile'
  watermark timestamptz, upserted int, deleted int, re_embedded int,
  started_at timestamptz DEFAULT now(), finished_at timestamptz, error text
);
03

Retrieval & Generation

One hybrid SQL query per request — no separate narrowing layer. A calibrated answerability gate sits before the LLM: below threshold, the model is never called and the API returns the templated "not covered + closest articles" response.

Widget Cloudflare API /v1 Haiku rails PG + rerank Claude Sonnet 5 1 · POST /v1/chat (SSE) WAF · rate limit · CORS 2 · classify (MPA) emergency? → fixed template, stop 3 · embed query (voyage-4) · hybrid RRF CTE → top-20 → rerank 4 · scored chunks + scripture ids 5 · gate below threshold → status: not_covered + closest articles 6 · generate — cached system prompt + 6–8 chunks → prose + [[SCRIPTURE:id]] stream deltas 7 · substitute + verify 8 · typed SSE: prose_delta · scripture_block · citations · status · done

Fig. 3 — Request lifecycle. Two exits before the model: the safety classifier (step 2, MPA) and the answerability gate (step 5).

The hybrid query

pg-sql · Reciprocal Rank Fusion, tenant scoped by schema
WITH vec AS (
  SELECT id, post_id, row_number() OVER (ORDER BY embedding <=> $query_vec) AS r
  FROM kb_mpa.chunk ORDER BY embedding <=> $query_vec LIMIT 50
),
fts AS (
  SELECT id, post_id, row_number() OVER (ORDER BY ts_rank_cd(tsv, q) DESC) AS r
  FROM kb_mpa.chunk, websearch_to_tsquery('english', $query_text) q
  WHERE tsv @@ q LIMIT 50
)
SELECT id, post_id, SUM(1.0 / (60 + r)) AS rrf_score
FROM (SELECT * FROM vec UNION ALL SELECT * FROM fts) fused
GROUP BY id, post_id ORDER BY rrf_score DESC LIMIT 20;

The verbatim-scripture pipeline

Ingest: extractwp:quote + precedingreference paragraph scripture tablequote_html · sha256byte-for-byte LLM composesprose +[[SCRIPTURE:id]] Server substitutesstored bytes injected;hash-check all output CI byte-equality gatesampled scripture articles;mismatch blocks deploy Unmatched quote-shaped output → reject response → one regeneration → fall back to "not covered". The model never has license to write scripture.

Fig. 4 — Verbatim scripture is a data-layer guarantee: 7,898 Wise Compass articles (90%) contain Gutenberg blockquotes; none pass through the model.

04

DTOs & Streams

Contracts defined once as zod schemas → OpenAPI 3.1 → generated client types. Versioned at /v1, additive-only — cached WordPress embeds cannot be force-upgraded.

typescript · zod DTOs
const ChatRequest = z.object({
  message:        z.string().min(1).max(1000),
  conversationId: z.string().uuid().optional(),   // multi-turn (phase 2)
  stream:         z.boolean().default(true),
});
// tenant resolved SERVER-SIDE from X-Site-Key + Origin — never from the body

const Segment = z.discriminatedUnion('type', [
  z.object({ type: z.literal('prose'),     text: z.string() }),
  z.object({ type: z.literal('scripture'), html: z.string(),
             reference: z.string(), articleId: z.number() }),   // server-substituted, hash-verified
]);

const Citation = z.object({
  articleId: z.number(), title: z.string(), url: z.string().url(),
  author: z.string().nullable(), reviewer: z.string().nullable(),   // MPA credentials surfaced
  score: z.number(),
});

const ChatResponse = z.object({
  requestId:       z.string().uuid(),
  status:          z.enum(['answered', 'not_covered', 'safety_redirect']),
  segments:        z.array(Segment),
  citations:       z.array(Citation),      // non-empty + from retrieved set, or API returns fallback
  closestArticles: z.array(Citation),      // populated when not_covered
  disclaimer:      z.string().nullable(),  // NON-NULLABLE for MPA — server-populated
  safetyFlags:     z.array(z.enum(['emergency','crisis','dosage_seeking'])),
  usage:           z.object({ inputTokens: z.number(), outputTokens: z.number() }),
});
// Errors: RFC 9457 application/problem+json

Streaming — typed events, tenant-specific release policy

Wise Compass message_start prose_delta ×n hold at placeholder scripture_block prose_delta ×n citations status done
MPA (medical) message_start buffered: rails run (entailment · number+unit · disclaimer) prose_delta burst citations status done
Not covered message_start status: not_covered + closestArticles[] done LLM never called · ~$0.0002

Fig. 5 — SSE event order per tenant. Raw model deltas never reach the client; the widget shows a "checking sources" state during the MPA buffer.

API surface

EndpointPurpose
POST /v1/chatAnswer path — SSE + JSON
GET /v1/articles/:idCitation hover-preview for the widget
POST /v1/feedbackThumbs up/down keyed by requestId — feeds the eval set
POST /v1/ingest/webhookWP save_post hook, HMAC-signed
GET /v1/healthz · GET /v1/admin/sync-statusLiveness + ingest watermarks/drift
05

AI Models

RoleModelPrice /MTok (in / out)Why
Answer generationclaude-sonnet-5$3 / $15
intro $2 / $10 → 31 Aug 2026
Best grounding-instruction adherence per dollar. Static system prompt prompt-cached — reads at ~0.1×.
Safety classifierclaude-haiku-4-5$1 / $5Pre-retrieval emergency / crisis / dosage routing, fixed signposting templates. ~650 tok/call.
MPA groundedness judgeclaude-haiku-4-5$1 / $5Sentence-level entailment vs retrieved chunks + deterministic number+unit byte-match rule.
Escalation tier (optional)claude-opus-4-8$5 / $25Only if eval shows Sonnet gaps on multi-article synthesis. Not in the base budget.
Embeddingsvoyage-4 · 1024-dim$0.06 /MTok
200M free tokens
Anthropic's recommended partner. Strong on Islamic transliteration + medical vocabulary. Matryoshka → 512-dim later without re-embedding. Corpus embeds for $0 under the free tier. Fallback: OpenAI text-embedding-3-small behind a provider-agnostic interface.
Rerankerbge-reranker-v2-m3 (self-host) or Voyage rerank≈ $0Its calibrated score is the "not covered" gate.

Token budget per query (single-turn)

ComponentTokensNotes
System prompt (grounding rules, placeholder protocol, tenant config)~1,300Prompt-cached — ~0.1× after first request per 5-min window
Retrieved context (6–8 chunks, max 2–3/article)~3,000Fresh input
Scripture metadata + question + formatting~250Fresh input
Total input~4,550of which ~1,300 cached
Output~450Placeholders keep scripture out of output tokens
Query embedding (voyage-4)~30≈ $0.000002 — noise
Multi-turn follow-ups (phase 2)+30–60% inputHistory rides in messages; cache absorbs most of it
06

Per-Query Price

Generation modelPer query5k q/mo10k q/mo30k q/mo
claude-haiku-4-5$0.0056$28$56$168
claude-sonnet-5 (intro, → Aug 2026)$0.0112$56$112$336
claude-sonnet-5 (standard)$0.0167$84$167$501
claude-opus-4-8$0.0279$140$279$837

Basis: ~4,550 input (1,300 cached at 0.1×) + ~450 output. Add-ons: MPA safety rails ≈ +$0.003/query on the medical tenant only; multi-turn ≈ +40% input. Three dampeners stack: prompt caching (30–50% of input spend), exact-match Redis response cache (repeat queries → $0), and the "not covered" gate (gated queries skip the LLM, ~$0.0002).

One-time itemCostNotes
Full-corpus embedding — ~24M tokens incl. overlap$0 – $1.43voyage-4 free tier → $0; list $1.28–1.43
Full re-embed (model or chunking change)< $1.50Sub-hour, sub-$2 — never a reason to defer a fix
07

Growth Tier & Expected Monthly Costs

The Growth tier buys three things the Lean tier (~$31/mo, single instance) doesn't have: a redundant API pair behind a load balancer, an isolated staging environment for safe releases, and bigger DB compute to keep the HNSW index fully in RAM as the corpus grows toward 170k articles.

Cloudflare CDN · WAF · DDoS · TLS $0/mo PRODUCTION · Frankfurt API replica 1Render Standard · $25 API replica 2Render Standard · $25 Sync workerRender Starter · $7 PgBouncertxn mode SupabasePro + Smallcomputekb_wckb_mpaPITR · failover$30 STAGING · isolated API + worker2× Render Starter · $14 Neon Postgres — scale-to-zeroidles nights/weekends · ~$8 External APIs (metered) Anthropic — Sonnet 5 + Haiku 4.5 Voyage — embeddings + rerank EU inference path · DPA/SCC Upstash Redis · $10 response cache · rate-limit state eu-central-1 Observability · $0 Langfuse (self-host on worker box) Sentry free · UptimeRobot free "HA-ish": two replicas behind the platform LB + managed-DB automatic failover. True multi-AZ Postgres HA (~$180/mo) deferred until traffic justifies it.

Fig. 6 — Growth-tier topology, ~$119/mo infrastructure. Every component EU-resident.

Environments & promotion flow (staging areas)

DEV — local Docker Compose: PG + API + worker 1k-article corpus sample fixture Neon branch per feature (free tier) $0/mo STAGING — Frankfurt Full corpus · Neon scale-to-zero CI gates run here on every release smoke + regression LLM traffic ~$22/mo + ~$15 LLM PROD — Frankfurt 2× API · Supabase Pro+Small · Upstash zero-downtime deploys · PITR spend circuit breaker per tenant ~$97/mo + query LLM merge → CI 6 QA gates green scripture byte-equality · golden set off-topic · cross-tenant · MPA suite chunk-count assertion Promotion is gate-driven: no release reaches production without all six CI gates green on staging against the full corpus.

Fig. 7 — Three staging areas. Dev costs nothing (local + free Neon branches); staging idles to near-zero when unused; prod carries the redundancy.

Growth-tier infrastructure — line items

ComponentPlan · Region$/mo
Prod database — Postgres 16 + pgvector, both tenant schemas, PITR, failoverSupabase Pro + Small compute · Frankfurt30
Staging database — full corpus copy, scale-to-zeroNeon Launch · Frankfurt~8
Prod API — redundant pair, zero-downtime deploys, autoscaling2× Render Standard · Frankfurt50
Prod sync workerRender Starter · Frankfurt7
Staging API + worker2× Render Starter · Frankfurt14
Cache + rate-limit stateUpstash Redis fixed 250MB · eu-central-110
CDN / WAF / TLS · uptime · errors · tracingCloudflare Free · UptimeRobot · Sentry free · Langfuse self-host0
Infrastructure total~119

Development-phase AI spend (one-time, during the ~6-week build)

ItemEst. costNotes
Corpus embedding + ~5 re-embeds during chunking iteration$0 – $8voyage-4 200M free tokens absorb ~9 full re-embeds; list price shown
Retrieval calibration + golden-set iterations (~4–5k Sonnet queries)$60 – 90Threshold tuning on ~200 labeled queries/tenant × iterations
CI regression runs during build (~100 runs × ~50-query suite)$60 – 90Mix of Haiku (rails) and Sonnet (generation) calls
Safety-rail tuning (Haiku classifier + entailment judge, ~5k calls)~$5Fixed-template routing verified against sensitive probes
AI-assisted development tooling (Claude Code, build window)$200 – 400The lever behind the 6-week timeline; ~2 months of a Max-tier plan or metered API equivalent
Total development-phase AI spend~$325 – 590One-time; separate from the build fee

Expected all-in monthly — Growth tier

Line10k q/mo30k q/mo60k q/mo
Infrastructure (above)$119$119~$135DB compute step-up
Prod generation — Sonnet 5 std, incl. MPA rails, ~25% cache/gate savings~$175~$520~$1,040
Staging LLM — smoke + regression traffic~$15~$15~$20
Embedding — incremental re-index traffic~$0~$0~$1
Total expected monthly~$309~$654~$1,196

At Sonnet 5 intro pricing (through Aug 2026), subtract ~33% from the generation line. The per-tenant daily spend circuit breaker converts these projections into a contractual ceiling: when tripped, the tenant degrades to retrieval-only "closest articles" mode instead of overspending. Lean-tier reference for comparison: ~$31/mo infra, single instance, schemas-as-staging — appropriate until launch traffic is proven.

08

Risk Register

Every risk carries a concrete, testable defense. The four criticals are release-blocking.

CriticalLLM-rephrased Quran/Hadith — the stated project-failure condition (90% of WC articles contain scripture)

Defense — scripture never passes through the model: ingest-time extraction (quote + preceding reference paragraph), [[SCRIPTURE:id]] placeholders, server-side byte substitution, exact-hash output validation, CI byte-equality suite that blocks deploy on mismatch.

CriticalStale or retracted medical content served after WordPress edits

Defense — 15-min modified_after poller with content-hash gating, explicit publish/unpublish/trash handling with cascade deletes, nightly full reconciliation; webhook for latency, polling as source of truth. Live edits indexed < 20 min.

CriticalConfidently-wrong answers — nearest-neighbor always returns something

Defense — deterministic gate on the reranker score, calibrated per tenant on ~200 labeled queries; below threshold the LLM is never called. Second gate: structured insufficient_context flag. Adversarial off-topic eval set as a release-blocking metric.

CriticalCross-tenant leakage — medical chunks under the Islamic-guidance brand, or vice versa

Defense — separate schemas with independent indexes (isolation by construction), RLS backstop, tenant resolved server-side from site key + Origin, per-tenant system prompts, CI cross-tenant probe (100 cross-domain queries → zero foreign post_ids).

HighIndex pollution — 14,531 MPA drafts, universal Elementor CTAs, tester records, 65k-word outliers

Defense — ingestion gate: published-only, shortcode stripper, junk denylist, block-boundary chunking, per-article top-k cap (max 2–3 chunks via DISTINCT ON / MMR), chunk-count assertion.

HighGDPR Article 9 — free-text health queries are sensitive-category data

Defense — pre-launch DPIA, consent capture in the widget, EU residency end-to-end, pseudonymized session-scoped logs (30–90-day raw-query retention), PII scrub before embedding/LLM calls, self-hosted tracing.

HighPublic widget as wallet-attack & prompt-injection surface

Defense — Cloudflare in front; per-tenant CORS allowlist; per-IP token bucket (10 req/min) + concurrent-SSE caps; 1,000-char input cap; per-tenant daily spend circuit breaker → retrieval-only mode; Turnstile after ~10 req/session; retrieved chunks framed as untrusted data, zero tools on the generation call; DTO validation that citations come from the retrieved set.

MediumWidget/API contract drift — cached WP embeds outliving deploys

Defense/v1 versioning with additive-only DTO evolution; tolerant-reader widget; cache-busting embed snippet; WP page-cache exclusion rules documented per site.

09

Compliance, Safety & QA Gates

Medical tenant (MPA) rails

  • Pre-retrieval classifier (regex + Haiku 4.5): routes to emergency | crisis | dosage_seeking | normal — the first three get fixed, clinician-approved signposting templates; the LLM is never invoked.
  • Extractive-first answers — "what our articles say", verbatim passages with attribution; never personalised advice, never diagnosis.
  • Groundedness rail — sentence-level entailment (Haiku judge) + deterministic rule: any number+unit span (mg, ml, doses, weeks) must appear byte-identical in a retrieved chunk.
  • Server-appended disclaimer — non-nullable DTO field; wording owned by the client's clinical/legal review; impossible for the model to omit.
  • Buffered release — MPA answers stream only after all rails pass.

GDPR package (pre-launch)

  • DPIA covering Article 9 health-query processing
  • Consent capture in the widget; privacy-policy addendum
  • EU residency: DB + API + inference path; SCC-backed Anthropic DPA, no-training confirmed
  • Pseudonymized logs, 30–90-day raw-query retention, PII scrub pre-embedding

Release-blocking QA gates (CI, run on staging)

GatePass condition
Scripture byte-equality100% exact match on sampled scripture articles — zero tolerance
Golden query set (per tenant)Recall@8 ≥ target on client-provided questions (15–20/site: simple · overlap · out-of-scope)
Adversarial off-topic set"Not covered" precision ≥ target; zero confident answers on out-of-domain probes
Cross-tenant probe100 cross-domain queries → zero foreign post_ids
MPA sensitive-query suiteDisclaimer on 100%; emergency templates fire on crisis probes
Chunk-count assertionPipeline lands within tolerance of sized ~59.4k

Observability

  • Langfuse (self-hosted): per-request trace — retrieval scores, gate decision, tokens, cost — keyed by requestId
  • Sentry (errors) · UptimeRobot (uptime) · structured per-tenant/day cost logging feeding the spend circuit breaker
10

Roadmap — AI-Accelerated Build

Six weeks to full launch with AI-assisted development (a conventional team would quote 10–12). The compression comes from generating the ingestion pipeline, DTO layer, and eval harness against the real exports already in hand — no waiting on sample data.

W0
W1
W2
W3
W4
W5
W6
P0 · Discovery & sign-offsblockers D1, C1, E1 + Zaheer briefing
gate
P1 · Ingestion & storageschemas, bulk load, scripture extraction
build
P2 · Retrieval & eval harnesshybrid query, rerank, gate calibration
build
P3 · Generation & safety railsorchestrator, substitution, rails, DTO API
build
P4 · Widget, SSE & syncWP bridge, abuse stack, staging deploy
build
P5 · QA & launchacceptance gates, DPIA, WC → MPA
launch

Fig. 8 — Build timeline. Overlapping phases run as parallel tracks; the two amber bars are client-gated, not engineering-gated.

PhaseDeliverablesExit criterion
P0 · DiscoveryBlocker sign-offs: disclaimer ownership (D1), public vs logged-in (C1), API account ownership (E1); Zaheer technical briefing; WP REST + app-password access confirmedAll blockers resolved in writing
P1 · Ingestion & storageTenant schemas migrated; cleaning pipeline (shortcodes, junk, published-only); block-aware chunker; scripture extraction; full bulk load + embeddingsChunk count ≈ 59.4k; scripture table covers 7,898 WC articles; spot-audit passes
P2 · Retrieval & evalHybrid RRF query; reranker; answerability threshold calibrated on ~200 labeled queries/tenant; golden-query CI harnessRecall@8 target on golden set; "not covered" fires correctly on out-of-domain probes
P3 · Generation & railsFastify /v1 API with zod DTOs; Sonnet 5 orchestration with cached system prompt; scripture substitution + hash validator; MPA classifier, entailment rail, server disclaimerScripture byte-equality suite green; MPA sensitive-query suite green
P4 · Widget, SSE & syncTyped-event SSE; embeddable widget (fetch + ReadableStream); WP delta endpoint + HMAC webhook (with Zaheer); poller + nightly reconciliation; abuse stack; staging environmentLive edit on WP reflected in index < 20 min; abuse limits verified under load
P5 · QA & launchFull acceptance run against client test questions (H1); DPIA + consent UI; Wise Compass launch first; MPA launches after clinical sign-off of disclaimer + blocklistAll six CI gates green; client sign-off per tenant
Sequencing note: Wise Compass launches first — its risk profile is scripture-integrity, which is fully machine-verifiable. MPA follows once the human-owned items (disclaimer wording, topic blocklist, DPIA) clear clinical/legal review — those are calendar-bound, not engineering-bound.
11

Investment & Quote

One fixed price for the complete, launched system — both knowledge-base assistants, live on your sites. No hourly billing, no surprises.

Fixed-price build · one-time

$9,750

Design, build, testing, and launch of both assistants — delivered end to end.

⏱ 6–7 weeks to launch

Everything included

  • Two separate, isolated assistants — Wise Compass & My Patient Advice
  • Answers drawn only from your published articles, each linked to its source
  • Quran & Hadith reproduced word-for-word, never paraphrased
  • Medical disclaimer & safety guardrails on the health assistant
  • Embeddable chat widget for both WordPress sites
  • Automatic sync — new and edited articles picked up continuously
  • EU-hosted for GDPR · staging & production · acceptance-tested against your own questions

Fixed-price scope. Optional later enhancements — multi-turn conversation, Arabic-query tuning, high-availability — are quoted separately as Phase 2.

Payment milestones

MILESTONE 130%
$2,925
Project start

Kick-off, discovery sign-off, and secure ingestion of both knowledge bases into the search index.

MILESTONE 240%
$3,900
Working system

The assistant answers live from your content with correct source links — demonstrated to you before release.

MILESTONE 330%
$2,925
Launch acceptance

All quality & safety gates green — scripture accuracy, medical disclaimer, "not covered" handling — live on both sites.

After launch — running costs

ItemBasisTypical
Cloud infrastructure — EU-hosted database, API & syncFixed monthly, accounts in your name~$119/mo
AI usage — answering questionsPay-as-you-go, ~$0.01–0.02 per question, with a spend cap you setusage-based
Support & maintenanceOptional retainer — monitoring, updates, priority fixesfrom $300/mo
Quote valid 30 days. Running costs are billed to your own provider accounts for full spend visibility — we set everything up and hand over the keys.
12

Open Items

Must resolve before build

  • Blocker  Corpus discrepancy — brief claims 100k+/70k+; real exports hold 8.7k/34.5k (19,977 MPA published). All costs here use the real corpus; confirm whether exports are partial.
  • Blocker  D1 — medical disclaimer wording from the client's clinical/legal review; we inject it, we don't author it.
  • Blocker  C1 — public vs logged-in per site: drives abuse budget and MPA liability posture.
  • Blocker  E1 — API account ownership (Anthropic + Voyage): client-owned or agency-managed with pass-through billing.
  • High  B1/B2 — WP REST access + webhook: app passwords on both sites; Zaheer implements the delta endpoint and save hook.
  • High  H1 — test questions: 15–20/site incl. 3–5 verbatim-scripture and 3–5 sensitive-medical probes — the backbone of the QA gates.
  • High  D4 — MPA topic blocklist (dosages, mental-health crisis, emergencies) needs clinical sign-off.

Working assumptions

  • WordPress remains the source of truth; the RAG store holds embeddings + metadata + verbatim scripture only.
  • English-dominant queries at launch; Arabic input works via the multilingual embedder but is not quality-guaranteed until eval'd (C7).
  • Single-turn Q&A at launch; multi-turn is a phase-2 flag already present in the DTO.
  • Scales to 170k articles / ~350k vectors with no structural change — same HNSW, bigger DB compute.
  • Prices are July 2026 USD list ex-VAT; Sonnet 5 intro pricing expires 31 Aug 2026 — both figures quoted throughout.

Deliberately excluded

  • Dedicated vector DB (Pinecone/Weaviate) — $600–1,200/yr for a network hop and a dual-write problem at this scale.
  • Fine-tuning — grounding quality here is a retrieval problem, not a model problem.
  • True multi-AZ Postgres HA — deferred until traffic justifies ~$180+/mo.