Model Fallback and Resilience

Part of: Effective AI Utilization — Table of Contents

The most important AI call is the one that fails. How your system responds to that failure defines its production-readiness.

BrianBot's Current State: No Safety Net

BrianBot's queue runs with attempts: 1 — one shot, no retries. The generateAIText() function has no try/catch around the provider call. When a step fails, the pipeline catches the error, marks the episode as FAILED, and moves on. No alternate model, no retry with backoff, no degraded output.

This works during development. In production, API errors are a matter of when, not if: rate limits, transient 500s, model deprecations, outages.

The Fallback Chain Pattern

Define a priority-ordered list of models per task type:

TRANSCRIPT: [claude-sonnet-4, gpt-4o, claude-haiku-4-5]
EXTRACTION: [claude-haiku-4-5, gpt-4o-mini, gemini-flash]

When the primary fails, try the next. Log which model actually served the request. This pairs naturally with the multi-provider strategy (see Multi-Provider Strategy) — your fallback should ideally cross provider boundaries so a single provider outage doesn't take you down.

Retry vs Fallback

These are different mechanisms: retry sends the same request to the same model after a delay (handles transient errors). Fallback sends to a different model immediately (handles sustained failures). Use both: retry 2-3 times with exponential backoff, then fall to the next model in the chain.

Circuit Breakers

If a model fails 5 times in 10 minutes, stop trying it for a cooldown period. This prevents hammering a degraded endpoint and burning through your error budget. Track failure rates per provider and auto-route around problems.

Graceful Degradation

BrianBot already does this partially for JSON parse failures — falling back to text splitting or raw content. Extend this principle: if the AI step fails entirely, can the pipeline continue with reduced quality? A transcript without topic extraction is still a transcript. A companion email without memory context is still useful.

Related: Model Routing Strategies, AI Pipeline Design, AI Observability and Debugging, Queue and Rate Limiting for AI Workloads

🏷️#ai 🏷️#resilience 🏷️#fallback 🏷️#architecture 🏷️#brianbot