Clinical Decision Support Agent: Production AI for Healthcare

The Challenge

Evidence-Based Answers, Instantly

Healthcare professionals need fast access to authoritative clinical knowledge. The body of published guidelines, research, and standards is vast and constantly evolving — impossible to navigate manually during a consultation. Generic LLMs cannot be trusted here: they hallucinate references, misinterpret clinical nuance, and lack the safety architecture required for healthcare.

The client needed a system that could ingest and index a large corpus of curated clinical content, retrieve the most relevant information for any query, and synthesize a cited answer — all while maintaining strict safety boundaries and keeping patient data entirely local.

Northorp built a clinical AI research assistant designed from the ground up for production healthcare use.

Healthcare Sector

Clinical Decision Support

Domain: Clinical decision support

Sources: Curated corpus of authoritative clinical content and published research

Constraint: Zero hallucination tolerance — every answer must cite authoritative sources

Goal: Cited answers in seconds, with safety-aware architecture

Grounded, Cited Answers

The knowledge base is locked — only the ingestion pipeline can add content. Every response is constructed from retrieved chunks with source citations. The LLM synthesises from retrieved evidence, never generating from ungrounded knowledge.

Hybrid Retrieval Pipeline

Runs BM25 full-text search and vector similarity search in parallel, combining results through fusion scoring with source authority boosts. Multi-stage pipeline routes queries by intent for optimised retrieval.

Safety-First Architecture

A dedicated safety layer sits before the LLM pipeline, screening queries against defined clinical boundaries. Audit logging tracks every query with intent, sources cited, and latency. No patient-identifiable information is stored.

Technical Approach

Hybrid Retrieval on a Full Docker Stack

A multi-stage agent pipeline — Safety → Router → Retriever → Synthesizer → Memory — powered by PostgreSQL + pgvector for hybrid search.

Hybrid BM25 + Vector Search

Runs BM25 full-text search (PostgreSQL tsvector) and ANN vector search (pgvector HNSW) in parallel, then combines results via fusion scoring with configurable source authority boosts for optimal relevance.

Automated Knowledge Ingestion

Scheduled ingestion pipeline scrapes, chunks, and embeds content from authoritative sources. Change detection tracks updates and flags them at appropriate granularity. Runs automatically on a cron schedule.

Real-Time SSE Streaming

Answers stream token-by-token via Server-Sent Events directly to the browser. Citations appear inline as they become available, giving clinicians immediate visibility into the supporting evidence.

Production Docker Deployment

Six Docker Compose services — PostgreSQL 16 + pgvector, LLM inference, memory store, FastAPI agent, ingestion pipeline, and Next.js frontend. Data stays local. Every query is audit-logged with latency, intent, and sources cited.

Multi

Authoritative clinical sources

5

Agent pipeline stages

Real-time

SSE streaming responses

Zero

Hallucination risk

Agent Pipeline

Five-Stage Architecture

1

Safety

Query screening against defined clinical safety boundaries

→

2

Router

Intent classification for optimised retrieval strategy

→

3

Retriever

Hybrid BM25 + vector search with fusion scoring

→

4

Synthesizer

Assembles prompt + chunks, streams cited answer via SSE

→

5

Memory

Session store with locked knowledge base and evolving user preferences

Clinical Decision Support Agent