Skip to main content
Mythos

Queue and Rate Limiting for AI Workloads

Part of: Effective AI Utilization — Table of Contents

AI APIs are external services with their own capacity limits. Your system's job queue is the buffer between "work to be done" and "capacity to do it."

BrianBot's Queue Design

BrianBot uses BullMQ with simple but effective settings: concurrency of 2 (two episodes processing simultaneously), rate limited to 5 jobs per minute, and single-attempt execution (no retries). The worker pulls from a Redis-backed queue and processes episodes through the full pipeline.

The concurrency of 2 is conservative — with 5 sequential AI calls per episode, each episode occupies an API connection for potentially minutes. Higher concurrency could trigger provider rate limits, especially on Anthropic's per-minute token caps.

The Rate Limiting Stack

Three layers of rate limiting matter: your queue's job rate (BrianBot: 5/min), the provider's API rate limit (varies by tier), and the provider's token-per-minute limit (often the binding constraint for heavy workloads). BrianBot only controls the first layer. Ideally, the AI abstraction layer (see Model Routing Strategies) would track provider-level limits and back-pressure the queue when approaching them.

Queue Priority

Not all jobs are equal. A time-sensitive live episode should jump ahead of a batch reprocessing job. BrianBot doesn't implement priority queues, but BullMQ supports them natively. Priority combined with the model override system could enable fast-track processing: high-priority jobs use faster/more-expensive models, low-priority jobs use cheaper models.

Dead Letter Queues

With attempts: 1, failed BrianBot jobs disappear into a FAILED status on the episode. A dead letter queue would capture these failures for inspection, manual retry, or automated re-routing to a different model (see Model Fallback and Resilience).

Related: Model Fallback and Resilience, AI Pipeline Design, Cost Tracking and Budget Controls, Model Routing Strategies

🏷️#ai 🏷️#queue 🏷️#rate-limiting 🏷️#infrastructure 🏷️#brianbot

Created with 💜 by One Inc | Copyright 2026