Model Routing Strategies

Part of: Effective AI Utilization — Table of Contents

Model routing is the decision logic that determines which AI model handles a given request. Get it right and you optimize for cost, quality, and speed simultaneously. Get it wrong and you're either overpaying for simple tasks or underdelivering on complex ones.

The BrianBot Pattern: Static Config Routing

BrianBot routes through a provider map — a simple lookup that resolves a provider string and model ID to an SDK call:

providerMap = { anthropic, google, openai }

Each pipeline step (topic extraction, transcript generation, metadata, memory, companion content) has a default model assignment. The routing decision is made at config time, not request time. This is static routing — predictable, debuggable, and perfectly adequate for batch pipelines.

The strength of this approach is the 4-tier override hierarchy: hardcoded defaults → admin DB config → user-level overrides → show-level overrides. This means you can change models without deploying code, and different shows can use different models based on their needs.

When Static Routing Isn't Enough

Static routing breaks down when you need to make routing decisions based on the content of the request. For example: routing simple classification tasks to Haiku and complex reasoning to Sonnet based on input complexity. This is dynamic routing and requires a classifier (which itself could be an AI call — turtles all the way down).

A practical middle ground is task-type routing: define categories of work (extraction, generation, analysis, creative) and assign model tiers to each category. BrianBot already does this implicitly — Haiku for extraction (0.3 temp), Sonnet for generation (0.7 temp). Making this explicit and configurable is the next evolution.

The Provider Abstraction Layer

The Vercel AI SDK gives BrianBot clean provider abstraction — swap anthropic("claude-sonnet-4-20250514") for openai("gpt-4o") and the calling code doesn't change. This is essential for any multi-provider strategy (see Multi-Provider Strategy memo).

The key design decision is where the abstraction boundary sits. BrianBot puts it in generateAIText() — a single function that every pipeline step calls. This is the right pattern: one place to add logging, error handling, token tracking, and eventually fallback logic (see Model Fallback and Resilience).

Routing Decision Tree

When designing model routing, work through this hierarchy:

Is there a task-specific override? (show/user/admin config) → Use it
What type of task is this? (extraction, generation, analysis) → Map to model tier
What's the token budget? → Constrain model choice by context window needs
What's the latency requirement? → Smaller models for real-time, larger for batch
What's the cost sensitivity? → Haiku is ~10x cheaper than Sonnet per token

Related: Token Optimization Playbook, Model Fallback and Resilience, Temperature and Parameter Tuning, Multi-Provider Strategy

🏷️#ai 🏷️#model-routing 🏷️#architecture 🏷️#brianbot