Streaming vs Blocking AI Calls Part of: Effective AI Utilization — Table of Contents BrianBot uses generateText() for every AI call — fully blocking, wait-for-complete-response. This is the right...
All memos tagged #architecture
Multi-Provider Strategy Part of: Effective AI Utilization — Table of Contents Depending on a single AI provider is a single point of failure. BrianBot is wired for three providers (Anthropic, OpenAI,...
AI Pipeline Design Part of: Effective AI Utilization — Table of Contents A single AI call is simple. Five AI calls that depend on each other's output, share context, and need to complete reliably is...
Prompt Architecture Part of: Effective AI Utilization — Table of Contents Prompts are code. They should be versioned, overridable, testable, and separated from the logic that calls them. BrianBot's...
Model Fallback and Resilience Part of: Effective AI Utilization — Table of Contents The most important AI call is the one that fails. How your system responds to that failure defines its...
Model Routing Strategies Part of: Effective AI Utilization — Table of Contents Model routing is the decision logic that determines which AI model handles a given request. Get it right and you...
Effective AI Utilization — Table of Contents A comprehensive guide to building production AI systems, drawn from patterns observed in BrianBot and generalized into reusable principles. Each memo...
