A production-grade clinical AI research assistant powered by hybrid retrieval, a multi-stage pipeline, and a carefully curated knowledge base — delivering cited answers in real time.
Healthcare professionals need fast access to authoritative clinical knowledge. The body of published guidelines, research, and standards is vast and constantly evolving — impossible to navigate manually during a consultation. Generic LLMs cannot be trusted here: they hallucinate references, misinterpret clinical nuance, and lack the safety architecture required for healthcare.
The client needed a system that could ingest and index a large corpus of curated clinical content, retrieve the most relevant information for any query, and synthesize a cited answer — all while maintaining strict safety boundaries and keeping patient data entirely local.
Northorp built a clinical AI research assistant designed from the ground up for production healthcare use.
Healthcare Sector
Clinical Decision Support
Domain: Clinical decision support
Sources: Curated corpus of authoritative clinical content and published research
Constraint: Zero hallucination tolerance — every answer must cite authoritative sources
Goal: Cited answers in seconds, with safety-aware architecture
The knowledge base is locked — only the ingestion pipeline can add content. Every response is constructed from retrieved chunks with source citations. The LLM synthesises from retrieved evidence, never generating from ungrounded knowledge.
Runs BM25 full-text search and vector similarity search in parallel, combining results through fusion scoring with source authority boosts. Multi-stage pipeline routes queries by intent for optimised retrieval.
A dedicated safety layer sits before the LLM pipeline, screening queries against defined clinical boundaries. Audit logging tracks every query with intent, sources cited, and latency. No patient-identifiable information is stored.
A multi-stage agent pipeline — Safety → Router → Retriever → Synthesizer → Memory — powered by PostgreSQL + pgvector for hybrid search.
Runs BM25 full-text search (PostgreSQL tsvector) and ANN vector search (pgvector HNSW) in parallel, then combines results via fusion scoring with configurable source authority boosts for optimal relevance.
Scheduled ingestion pipeline scrapes, chunks, and embeds content from authoritative sources. Change detection tracks updates and flags them at appropriate granularity. Runs automatically on a cron schedule.
Answers stream token-by-token via Server-Sent Events directly to the browser. Citations appear inline as they become available, giving clinicians immediate visibility into the supporting evidence.
Six Docker Compose services — PostgreSQL 16 + pgvector, LLM inference, memory store, FastAPI agent, ingestion pipeline, and Next.js frontend. Data stays local. Every query is audit-logged with latency, intent, and sources cited.
Multi
Authoritative clinical sources
5
Agent pipeline stages
Real-time
SSE streaming responses
Zero
Hallucination risk
Query screening against defined clinical safety boundaries
Intent classification for optimised retrieval strategy
Hybrid BM25 + vector search with fusion scoring
Assembles prompt + chunks, streams cited answer via SSE
Session store with locked knowledge base and evolving user preferences
Northorp built a production-grade clinical AI that delivers evidence-based answers with zero hallucination risk. Your healthcare organisation could be next.