OpenAI-Compatible API
Polyant exposes its assistants through an HTTP API that mirrors a subset of OpenAI’s Chat Completions API. Any client that speaks that protocol — the openai Python SDK, openai-node, LangChain, LiteLLM, etc. — can call Polyant by changing the base URL.
Base URL
http://<engine-host>:4000/v1In a default local install, that is http://localhost:4000/v1.
Endpoints
GET /v1/models
Lists every active instance as a model:
{
"object": "list",
"data": [
{
"id": "demo-bot",
"object": "model",
"created": 1730390400,
"owned_by": "polyant"
},
{
"id": "support-bot",
"object": "model",
"created": 1730491200,
"owned_by": "polyant"
}
]
}The id is the instance slug. Disabled instances are filtered out.
POST /v1/chat/completions
The conversation endpoint. Request body:
{
"model": "demo-bot",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false
}Response (non-streaming): a standard Chat Completion shape.
{
"id": "chatcmpl-9f3a8b2c1d4e5f6a7b8c9d0e",
"object": "chat.completion",
"created": 1730491200,
"model": "demo-bot",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello!" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}The id field is generated server-side as chatcmpl-<24 hex chars> to match the OpenAI shape. The only emitted finish_reason value is "stop" — tool calls are orchestrated internally by the Supervisor and are never surfaced to the API caller, so tool_calls, length, content_filter, etc. are not returned.
Caveat — usage is always zero. The field is included in the response envelope to keep clients that read it from crashing, but Polyant does not currently aggregate per-request token counts at this layer. Treat the values as placeholders.
Response (streaming, "stream": true): Server-Sent Events with data: lines containing JSON deltas, terminated by data: [DONE].
Authentication
GET /v1/models requires authentication — either a session JWT (admin panel users) or a per-instance API key via Authorization: Bearer sk-.... It is not an unauthenticated discovery endpoint; unauthenticated calls return HTTP 401.
POST /v1/chat/completions follows per-instance rules. If the instance has authEnabled = false, no header is required. Anyone who can reach the engine can talk to the instance — useful for local development, dangerous in production.
If the instance has authEnabled = true, every request must include:
Authorization: Bearer <auth-api-key>The expected key is the value configured in the instance’s Settings tab under Auth API key (stored AES-256-GCM encrypted in instance_secrets.auth_api_key). The token from the Authorization: Bearer header is compared against the decrypted value using timingSafeEqual to prevent timing attacks (see packages/engine/src/server/openai/openai.controller.ts:191-201).
Errors
Errors are returned as the standard NestJS exception envelope — not the OpenAI {message, type, code} shape:
{
"statusCode": 401,
"message": "Invalid API key",
"error": "Unauthorized"
}| HTTP | Cause |
|---|---|
| 400 | Request body fails Zod validation |
| 401 | authEnabled=true and missing/wrong Bearer token, or unknown model |
| 429 | Rate limit exceeded |
Rate limit
/v1/chat/completions is throttled at 20 requests per minute per client IP, enforced by @Throttle({ default: { limit: 20, ttl: 60_000 } }) on the controller (packages/engine/src/server/openai/openai.controller.ts:63). Exceeding the budget returns HTTP 429. For higher volume, rate-limit upstream at the proxy (NGINX, Traefik, Cloudflare) and raise the engine ceiling accordingly.
Streaming
Streaming follows OpenAI’s SSE protocol exactly:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hel"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"lo"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]The first chunk carries role: "assistant"; subsequent chunks carry only content deltas; the last chunk carries finish_reason.
Worked example: cURL
Non-streaming:
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-myinstancekey" \
-d '{
"model": "demo-bot",
"messages": [{"role": "user", "content": "Summarize Polyant in one sentence."}]
}'Streaming:
curl -N http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-myinstancekey" \
-d '{
"model": "demo-bot",
"stream": true,
"messages": [{"role": "user", "content": "Tell me a story about an AI assistant."}]
}'Worked example: Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4000/v1",
api_key="sk-myinstancekey",
)
resp = client.chat.completions.create(
model="demo-bot",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)Worked example: TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000/v1",
apiKey: "sk-myinstancekey",
});
const resp = await client.chat.completions.create({
model: "demo-bot",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);What the API does not support
- Function calling on the request side. Tools are configured per instance from the admin panel, not in the request payload.
- Multiple
systemmessages. The system prompt is composed server-side from the eight Prompts sections. - Image input via the API. Multimodal input arrives only through channel adapters today.
- Embeddings or moderations. Those endpoints are not implemented.