Case study

Multi-LLM Router

Routes prompts across Claude, GPT, and Perplexity by task — fast and cheap.

shippedSolo buildAI AgentsFull Stack2026

Summary

Built a router that picks the right model for each task: Haiku for cheap utility work, Sonnet/Opus for reasoning, Perplexity for fresh web context. Cuts cost without losing quality.

Problem

Most teams pick one model for everything. That's expensive when Haiku could have handled the task, and slow or wrong when Sonnet would have been better. The harder problem is that you can't just "use the cheapest one" — different prompts need different context windows, freshness, and reasoning depth. You need a router that knows the difference and stays transparent about why it picked what it picked.

Approach

I built the router around task classification, not model preference. Each incoming task gets tagged — utility, reasoning, web-fresh, code — and routes to the right model: Haiku for short utility work, Sonnet/Opus for multi-step reasoning, Perplexity when the answer needs current web context. The fallback chain is explicit. If the primary model errors or rate-limits, the next-best model picks up with the same prompt and cache key. Prompt caching is tuned aggressively: system instructions and tool definitions live in the cached prefix so cache-read ratios stay high across requests, which is where the real cost wins come from. The demo is the proof. You pick a task in the UI, the router picks a model, and you see why — task tag, chosen model, fallback chain, and the actual response side-by-side. No marketing claims, just observable routing.

Architecture

UserAgentExternalStore

Task tag in, model out. Fallback chain is explicit and observable.

Result

Multi-LLM Router runs as a public demo with live model selection across Claude, GPT, and Perplexity. Cost stays low because cheap models handle cheap work; quality stays high because the hard tasks route to the right brain. Shipped, indexed, and finding its first real users.

Highlights

Per-task model selection with a transparent fallback chain
Prompt caching tuned for high cache-read ratios
A working demo, not a write-up — pick a task and watch routing

Have something similar?

Let's talk.

Get in touch