Model Fallback and Resilience
Part of: Effective AI Utilization — Table of Contents
The most important AI call is the one that fails. How your system responds to that failure defines its production-readiness.
BrianBot's Current State: No Safety Net
BrianBot's queue runs with attempts: 1 — one shot, no retries. The generateAIText() function has no try/catch around the provider call. When a step fails, the pipeline catches the error, marks the episode as FAILED, and moves on. No alternate model, no retry with backoff, no degraded output.
This works during development. In production, API errors are a matter of when, not if: rate limits, transient 500s, model deprecations, outages.
The Fallback Chain Pattern
Define a priority-ordered list of models per task type:
TRANSCRIPT: [claude-sonnet-4, gpt-4o, claude-haiku-4-5]
EXTRACTION: [claude-haiku-4-5, gpt-4o-mini, gemini-flash]When the primary fails, try the next. Log which model actually served the request. This pairs naturally with the multi-provider strategy (see Multi-Provider Strategy) — your fallback should ideally cross provider boundaries so a single provider outage doesn't take you down.
Retry vs Fallback
These are different mechanisms: retry sends the same request to the same model after a delay (handles transient errors). Fallback sends to a different model immediately (handles sustained failures). Use both: retry 2-3 times with exponential backoff, then fall to the next model in the chain.
Circuit Breakers
If a model fails 5 times in 10 minutes, stop trying it for a cooldown period. This prevents hammering a degraded endpoint and burning through your error budget. Track failure rates per provider and auto-route around problems.
Graceful Degradation
BrianBot already does this partially for JSON parse failures — falling back to text splitting or raw content. Extend this principle: if the AI step fails entirely, can the pipeline continue with reduced quality? A transcript without topic extraction is still a transcript. A companion email without memory context is still useful.
Related: Model Routing Strategies, AI Pipeline Design, AI Observability and Debugging, Queue and Rate Limiting for AI Workloads
