Skip to Content
Polyant is open source under AGPL-3.0 — star us on GitHub.
ConceptsAI Gateway

AI Gateway

The AI Gateway is the provider-agnostic abstraction every Polyant component uses to call a large language model. The supervisor does not import @ai-sdk/openai. The memory extractor does not import @ai-sdk/anthropic. They both call chat({ tier: "standard", messages, ... }) and the gateway picks the right provider, the right model, and threads logging, tracing, and cost accounting through the call.

This page covers the tier model, the per-instance provider override, the embedding exception, the call-type tagging used for cost analytics, and the structured logging that lands every call in ai_logs.

Tier abstraction, not model lock-in

Every call site declares a workload class, not a model:

  • fast — cheap classification work. Memory extraction, webhook event matching, title generation, conversation summarisation.
  • standard — the main conversation tier. The supervisor uses this for user-facing turns.
  • heavy — deep reasoning. Used sparingly today; reserved for tasks where the operator opts into reasoning models (o3, Claude Opus, etc.).

The mapping from tier to (provider, model) lives in packages/engine/src/ai-gateway/config.ts. The default table:

providerfaststandardheavy
openaigpt-4o-minigpt-4oo3
anthropicclaude-haiku-4-5-20251001claude-sonnet-4-5-20250929claude-opus-4-6
bedrockamazon.nova-lite-v1:0anthropic.claude-sonnet-4-...anthropic.claude-opus-4-...

A deployment owner can swap the entire fleet to Anthropic by setting the instance’s aiProvider to anthropic, or pin a single tier to a specific model — and no application code changes. Every workload that asked for standard continues to ask for standard.

Per-instance override — provider AND specific model

Each agent has two columns on the instances row that drive model selection: provider (string: openai / anthropic / bedrock) and model (string: any model id the provider supports, e.g. gpt-4.1-mini, claude-sonnet-4-5-20250929). Both are editable from the admin panel under Settings → AI provider for the agent. Either can be left null.

How the gateway combines them with the caller’s tier:

final.provider = request.provider (instance row) ?? DEFAULT_PROVIDER ("openai") final.model = request.model (instance row) ?? resolveModel(provider, tier)

The exact model the user picked wins over the tier table — tier becomes a label-only hint when model is set. If model is null, the tier table picks the model (the matrix above). So:

  • Operator sets provider="anthropic", model="claude-sonnet-4-5-20250929" → the supervisor’s standard call goes to that exact Sonnet build, regardless of what the tier table says.
  • Operator sets provider="anthropic" and leaves model null → the supervisor’s standard call resolves to whatever the tier table says is Anthropic’s standard today.
  • Operator leaves both null → the gateway falls back to DEFAULT_PROVIDER (“openai”) and the tier table.

Where the explicit model is and isn’t honoured

This is the part the resolution rule alone hides: not every call site passes the instance’s model. The supervisor does, for the user-facing turn. Background workloads do not.

Call siteTierPasses providerPasses modelNet effect
Supervisor — user turnstandardyesyesexact instance model used
Sub-agent via spawnTaskstandardyesyesinherits parent’s model
Memory extractor (extractMemories)fastyesnotier-table model for that provider
Title generator (generateConversationTitle)fastyesnotier-table model
Summary updater (updateSummary)fastyesnotier-table model
Event matcher (Room, Webhook)fastyesnotier-table model

The asymmetry is intentional: a cheap classifier should stay cheap. If you pin Sonnet on your agent, you do not want every memory-extraction pass to also use Sonnet — that would silently 10× the cost of fire-and-forget background work. The tier table keeps fast-tier calls on the cheap model for the chosen provider, even when the user-facing turn runs on the premium model the operator picked.

If you genuinely need background work to follow the explicit model, the right knob is the tier table in ai-gateway/config.ts, not the instance row.

Capability gates apply to the resolved model

A few features (thinkingEnabled, extended reasoning) are only available on specific model builds. config-resolver.ts runs an isThinkingCapable(provider, effectiveModel) check using the instance’s explicit model — or the tier-table standard model if none is set — and silently drops the flag when the resolved model does not support it. A stale thinkingEnabled=true after switching to a non-capable model has no runtime effect; nothing crashes, the feature just turns off.

Per-instance secrets

Per-instance API keys live encrypted in instance_secrets (openai_api_key, anthropic_api_key, aws_access_key_id + aws_secret_access_key + aws_region). The provider adapters read these from the resolved config and never touch process env directly. Without the right key, the call fails fast at the adapter — there is no fallback to a process-wide key.

Why tier abstraction earns its keep

The same supervisor binary serves many instances at once. Some run on Anthropic, some on OpenAI, some on Bedrock. Some operators upgrade their standard tier from gpt-4o to gpt-4.1 overnight. None of these changes touch supervisor code, prompt templates, or tool implementations.

The tier names also document intent. A grep for tier: "fast" instantly tells you which call paths are cost-sensitive (extractors, classifiers) versus user-facing (standard). When a new optimisation lands — caching, batching, a cheaper model — the operator can roll it out to every “fast” caller in one config change.

Embeddings are an exception

Embeddings always go through OpenAI. packages/engine/src/memory/embedder.ts hard-codes text-embedding-3-small (1536d). Anthropic has no embedding API; Bedrock’s Titan embeddings have not been wired in. The practical implication: even an Anthropic-only deployment requires the per-instance openai_api_key secret. Without it, memory extraction and knowledge ingestion fail.

The chat/streaming path still uses the instance’s configured provider — only the embedding step is fixed to OpenAI.

Call-type tagging: conversation vs. service

Every gateway call carries a callType in ChatCallOptions:

  • "conversation" — user-visible turns. The supervisor handling an inbound message.
  • "service" — background work. Memory extraction, title generation, webhook event matching, room event matching, summarisation.

The tag flows through to ai_logs.call_type. Analytics queries can then split billable conversation cost from infrastructure service cost — answering questions like “what does it cost me to keep memory enabled for this instance?” without confusing the per-conversation cost reports.

The supervisor’s main path passes "conversation". Every fire-and-forget extractor passes "service". The convention is enforced by code review, not by the type system.

Logging: every call lands in ai_logs

After every successful chat() or chatStream(), the gateway writes one row into ai_logs (packages/engine/src/ai-gateway/logger.ts):

columnmeaning
provideropenai, anthropic, bedrock
modelresolved model id
tierfast / standard / heavy
thinkingwhether extended thinking / reasoning was requested
prompt_tokens, completion_tokens, total_tokensusage straight from the SDK
estimated_cost_usdper the cost table in config.ts
duration_mswall-clock latency of the call
step_countnumber of tool-call steps (for agentic loops)
conversation_id, instance_idcorrelation keys
call_typeconversation or service

The logger buffers in memory and flushes every 10 entries or every 5s. On flush failure entries are re-queued; the buffer is capped at 1000 to bound memory.

ai_logs is the source of truth for the admin panel’s cost dashboard and the per-instance analytics pages. It is distinct from pipeline_traces, which records end-to-end pipeline latency including non-LLM phases (context prep, tool building, persistence) — see Architecture.

How it works

+----------------------------------------------------+ | caller declares workload class | | supervisor.run() tier: "standard" | | extractMemories() tier: "fast", call=service | | webhookMatcher() tier: "fast", call=service | | titleGenerator() tier: "fast", call=service | +--------------------------+-------------------------+ | v +----------------------------------------------------+ | AI Gateway (ai-gateway/index.ts) | | resolveCallConfig(request, options) | | provider <- request.provider | instance | default | modelId <- request.model | resolveModel() | | buildLangSmithProviderOptions(...) | +--------------------------+-------------------------+ | +-------------------+-------------------+ v v v OpenAIProvider AnthropicProvider BedrockProvider (providers/openai.ts) (...anthropic.ts) (...bedrock.ts) | | | +---------+---------+-------------------+ | v Vercel AI SDK (generateText / streamText) | v logAndRecordUsage() pipelineLog.llmResponse(...) estimateCost(provider, model, tokens) aiLogger.log({ provider, model, tier, tokens, cost, durationMs, conversationId, instanceId, callType }) | v ai_logs table (buffered flush every 10 entries / 5s)

Code reference

  • packages/engine/src/ai-gateway/index.tschat(), chatStream(), resolveCallConfig(), logAndRecordUsage().
  • packages/engine/src/ai-gateway/config.ts — Per-provider tier mapping, cost table, resolveModel(), estimateCost(), isThinkingCapable().
  • packages/engine/src/ai-gateway/types.tsChatRequest, ChatResponse, ChatStreamResult, TierMapping, ProviderAdapter.
  • packages/engine/src/ai-gateway/providers/openai.ts — OpenAI adapter (Vercel AI SDK @ai-sdk/openai).
  • packages/engine/src/ai-gateway/providers/anthropic.ts — Anthropic adapter (@ai-sdk/anthropic).
  • packages/engine/src/ai-gateway/providers/bedrock.ts — Bedrock adapter (@ai-sdk/amazon-bedrock).
  • packages/engine/src/ai-gateway/logger.tsai_logs schema + buffered AILogger.
  • packages/engine/src/ai-gateway/langsmith.ts — LangSmith tracing provider options.
  • packages/engine/src/instances/config-resolver.ts — Per-instance provider + secrets resolution (30s TTL cache).
  • packages/engine/src/memory/embedder.ts — The OpenAI-only embeddings exception.

See also

  • Architecture — where the gateway sits in the request pipeline.
  • Memory — uses tier: "fast" for extraction.
  • Agents — the supervisor consumes the gateway via tier: "standard".
  • Tools — some tools (e.g. verifyDocument) also call the gateway through their ToolContext.
Last updated on