AI Gateway
The AI Gateway is the provider-agnostic abstraction every Polyant component uses to call a large language model. The supervisor does not import @ai-sdk/openai. The memory extractor does not import @ai-sdk/anthropic. They both call chat({ tier: "standard", messages, ... }) and the gateway picks the right provider, the right model, and threads logging, tracing, and cost accounting through the call.
This page covers the tier model, the per-instance provider override, the embedding exception, the call-type tagging used for cost analytics, and the structured logging that lands every call in ai_logs.
Tier abstraction, not model lock-in
Every call site declares a workload class, not a model:
fast— cheap classification work. Memory extraction, webhook event matching, title generation, conversation summarisation.standard— the main conversation tier. The supervisor uses this for user-facing turns.heavy— deep reasoning. Used sparingly today; reserved for tasks where the operator opts into reasoning models (o3, Claude Opus, etc.).
The mapping from tier to (provider, model) lives in packages/engine/src/ai-gateway/config.ts. The default table:
| provider | fast | standard | heavy |
|---|---|---|---|
| openai | gpt-4o-mini | gpt-4o | o3 |
| anthropic | claude-haiku-4-5-20251001 | claude-sonnet-4-5-20250929 | claude-opus-4-6 |
| bedrock | amazon.nova-lite-v1:0 | anthropic.claude-sonnet-4-... | anthropic.claude-opus-4-... |
A deployment owner can swap the entire fleet to Anthropic by setting the instance’s aiProvider to anthropic, or pin a single tier to a specific model — and no application code changes. Every workload that asked for standard continues to ask for standard.
Per-instance override — provider AND specific model
Each agent has two columns on the instances row that drive model selection: provider (string: openai / anthropic / bedrock) and model (string: any model id the provider supports, e.g. gpt-4.1-mini, claude-sonnet-4-5-20250929). Both are editable from the admin panel under Settings → AI provider for the agent. Either can be left null.
How the gateway combines them with the caller’s tier:
final.provider = request.provider (instance row) ?? DEFAULT_PROVIDER ("openai")
final.model = request.model (instance row) ?? resolveModel(provider, tier)The exact model the user picked wins over the tier table — tier becomes a label-only hint when model is set. If model is null, the tier table picks the model (the matrix above). So:
- Operator sets
provider="anthropic", model="claude-sonnet-4-5-20250929"→ the supervisor’sstandardcall goes to that exact Sonnet build, regardless of what the tier table says. - Operator sets
provider="anthropic"and leavesmodelnull → the supervisor’sstandardcall resolves to whatever the tier table says is Anthropic’sstandardtoday. - Operator leaves both null → the gateway falls back to
DEFAULT_PROVIDER(“openai”) and the tier table.
Where the explicit model is and isn’t honoured
This is the part the resolution rule alone hides: not every call site passes the instance’s model. The supervisor does, for the user-facing turn. Background workloads do not.
| Call site | Tier | Passes provider | Passes model | Net effect |
|---|---|---|---|---|
| Supervisor — user turn | standard | yes | yes | exact instance model used |
Sub-agent via spawnTask | standard | yes | yes | inherits parent’s model |
Memory extractor (extractMemories) | fast | yes | no | tier-table model for that provider |
Title generator (generateConversationTitle) | fast | yes | no | tier-table model |
Summary updater (updateSummary) | fast | yes | no | tier-table model |
| Event matcher (Room, Webhook) | fast | yes | no | tier-table model |
The asymmetry is intentional: a cheap classifier should stay cheap. If you pin Sonnet on your agent, you do not want every memory-extraction pass to also use Sonnet — that would silently 10× the cost of fire-and-forget background work. The tier table keeps fast-tier calls on the cheap model for the chosen provider, even when the user-facing turn runs on the premium model the operator picked.
If you genuinely need background work to follow the explicit model, the right knob is the tier table in ai-gateway/config.ts, not the instance row.
Capability gates apply to the resolved model
A few features (thinkingEnabled, extended reasoning) are only available on specific model builds. config-resolver.ts runs an isThinkingCapable(provider, effectiveModel) check using the instance’s explicit model — or the tier-table standard model if none is set — and silently drops the flag when the resolved model does not support it. A stale thinkingEnabled=true after switching to a non-capable model has no runtime effect; nothing crashes, the feature just turns off.
Per-instance secrets
Per-instance API keys live encrypted in instance_secrets (openai_api_key, anthropic_api_key, aws_access_key_id + aws_secret_access_key + aws_region). The provider adapters read these from the resolved config and never touch process env directly. Without the right key, the call fails fast at the adapter — there is no fallback to a process-wide key.
Why tier abstraction earns its keep
The same supervisor binary serves many instances at once. Some run on Anthropic, some on OpenAI, some on Bedrock. Some operators upgrade their standard tier from gpt-4o to gpt-4.1 overnight. None of these changes touch supervisor code, prompt templates, or tool implementations.
The tier names also document intent. A grep for tier: "fast" instantly tells you which call paths are cost-sensitive (extractors, classifiers) versus user-facing (standard). When a new optimisation lands — caching, batching, a cheaper model — the operator can roll it out to every “fast” caller in one config change.
Embeddings are an exception
Embeddings always go through OpenAI. packages/engine/src/memory/embedder.ts hard-codes text-embedding-3-small (1536d). Anthropic has no embedding API; Bedrock’s Titan embeddings have not been wired in. The practical implication: even an Anthropic-only deployment requires the per-instance openai_api_key secret. Without it, memory extraction and knowledge ingestion fail.
The chat/streaming path still uses the instance’s configured provider — only the embedding step is fixed to OpenAI.
Call-type tagging: conversation vs. service
Every gateway call carries a callType in ChatCallOptions:
"conversation"— user-visible turns. The supervisor handling an inbound message."service"— background work. Memory extraction, title generation, webhook event matching, room event matching, summarisation.
The tag flows through to ai_logs.call_type. Analytics queries can then split billable conversation cost from infrastructure service cost — answering questions like “what does it cost me to keep memory enabled for this instance?” without confusing the per-conversation cost reports.
The supervisor’s main path passes "conversation". Every fire-and-forget extractor passes "service". The convention is enforced by code review, not by the type system.
Logging: every call lands in ai_logs
After every successful chat() or chatStream(), the gateway writes one row into ai_logs (packages/engine/src/ai-gateway/logger.ts):
| column | meaning |
|---|---|
provider | openai, anthropic, bedrock |
model | resolved model id |
tier | fast / standard / heavy |
thinking | whether extended thinking / reasoning was requested |
prompt_tokens, completion_tokens, total_tokens | usage straight from the SDK |
estimated_cost_usd | per the cost table in config.ts |
duration_ms | wall-clock latency of the call |
step_count | number of tool-call steps (for agentic loops) |
conversation_id, instance_id | correlation keys |
call_type | conversation or service |
The logger buffers in memory and flushes every 10 entries or every 5s. On flush failure entries are re-queued; the buffer is capped at 1000 to bound memory.
ai_logs is the source of truth for the admin panel’s cost dashboard and the per-instance analytics pages. It is distinct from pipeline_traces, which records end-to-end pipeline latency including non-LLM phases (context prep, tool building, persistence) — see Architecture.
How it works
+----------------------------------------------------+
| caller declares workload class |
| supervisor.run() tier: "standard" |
| extractMemories() tier: "fast", call=service |
| webhookMatcher() tier: "fast", call=service |
| titleGenerator() tier: "fast", call=service |
+--------------------------+-------------------------+
|
v
+----------------------------------------------------+
| AI Gateway (ai-gateway/index.ts) |
| resolveCallConfig(request, options) |
| provider <- request.provider | instance | default
| modelId <- request.model | resolveModel() |
| buildLangSmithProviderOptions(...) |
+--------------------------+-------------------------+
|
+-------------------+-------------------+
v v v
OpenAIProvider AnthropicProvider BedrockProvider
(providers/openai.ts) (...anthropic.ts) (...bedrock.ts)
| | |
+---------+---------+-------------------+
|
v
Vercel AI SDK (generateText / streamText)
|
v
logAndRecordUsage()
pipelineLog.llmResponse(...)
estimateCost(provider, model, tokens)
aiLogger.log({ provider, model, tier, tokens,
cost, durationMs, conversationId,
instanceId, callType })
|
v
ai_logs table (buffered flush every 10 entries / 5s)Code reference
packages/engine/src/ai-gateway/index.ts—chat(),chatStream(),resolveCallConfig(),logAndRecordUsage().packages/engine/src/ai-gateway/config.ts— Per-provider tier mapping, cost table,resolveModel(),estimateCost(),isThinkingCapable().packages/engine/src/ai-gateway/types.ts—ChatRequest,ChatResponse,ChatStreamResult,TierMapping,ProviderAdapter.packages/engine/src/ai-gateway/providers/openai.ts— OpenAI adapter (Vercel AI SDK@ai-sdk/openai).packages/engine/src/ai-gateway/providers/anthropic.ts— Anthropic adapter (@ai-sdk/anthropic).packages/engine/src/ai-gateway/providers/bedrock.ts— Bedrock adapter (@ai-sdk/amazon-bedrock).packages/engine/src/ai-gateway/logger.ts—ai_logsschema + bufferedAILogger.packages/engine/src/ai-gateway/langsmith.ts— LangSmith tracing provider options.packages/engine/src/instances/config-resolver.ts— Per-instance provider + secrets resolution (30s TTL cache).packages/engine/src/memory/embedder.ts— The OpenAI-only embeddings exception.
See also
- Architecture — where the gateway sits in the request pipeline.
- Memory — uses
tier: "fast"for extraction. - Agents — the supervisor consumes the gateway via
tier: "standard". - Tools — some tools (e.g.
verifyDocument) also call the gateway through theirToolContext.