Temperature and Parameter Tuning

Part of: Effective AI Utilization — Table of Contents

Temperature is the most misunderstood AI parameter. It doesn't control "creativity" — it controls the probability distribution over the next token. Low temperature makes the model more deterministic (picks the highest-probability token). High temperature flattens the distribution (more diverse token selection).

BrianBot's Two-Tier Approach

BrianBot uses exactly two temperature settings: 0.3 for analytical tasks (topic extraction, metadata, memory) and 0.7 for generative tasks (transcript, companion content). This binary split is effective because it maps to a real distinction: extraction tasks have "right answers" that benefit from determinism, while generation tasks benefit from variety.

The Temperature Spectrum in Practice

0.0–0.2: Classification, structured extraction, JSON generation, yes/no decisions. You want the model's highest-confidence answer. BrianBot's extraction steps could arguably go lower.

0.3–0.5: Analytical tasks where you want consistency but not rigidity. Summarization, entity extraction, metadata generation. BrianBot's sweet spot for Haiku tasks.

0.6–0.8: Creative generation where quality matters more than novelty. Blog posts, newsletters, conversational content. BrianBot's Sonnet tasks live here.

0.9–1.0: Brainstorming, diverse option generation, creative writing where surprise is valued. Rarely appropriate for production pipelines.

maxTokens: The Silent Cost Control

BrianBot's per-step maxTokens (1024–8192) act as both quality and cost controls. The key insight: set maxTokens based on observed output length, not theoretical maximum. If your metadata step consistently produces 150-token responses, a 1024 cap is fine. But if your transcript step sometimes needs 6000 tokens and you cap at 4096, you're silently truncating output.

The Parameters You're Not Tuning

Beyond temperature and maxTokens, most providers support top_p (nucleus sampling), frequency_penalty, and presence_penalty. For production pipelines, the default values are almost always correct. The exception: if you're seeing repetitive output, a small frequency penalty (0.1–0.3) can help.

Related: Model Routing Strategies, Token Optimization Playbook, Prompt Architecture

🏷️#ai 🏷️#parameters 🏷️#temperature 🏷️#brianbot