OpenAI-Compatible API

Polyant exposes its assistants through an HTTP API that mirrors a subset of OpenAI’s Chat Completions API. Any client that speaks that protocol — the openai Python SDK, openai-node, LangChain, LiteLLM, etc. — can call Polyant by changing the base URL.

Base URL


http://<engine-host>:4000/v1

In a default local install, that is http://localhost:4000/v1.

Endpoints

`GET /v1/models`

Lists every active instance as a model:


{
  "object": "list",
  "data": [
    {
      "id": "demo-bot",
      "object": "model",
      "created": 1730390400,
      "owned_by": "polyant"
    },
    {
      "id": "support-bot",
      "object": "model",
      "created": 1730491200,
      "owned_by": "polyant"
    }
  ]
}

The id is the instance slug. Disabled instances are filtered out.

`POST /v1/chat/completions`

The conversation endpoint. Request body:


{
  "model": "demo-bot",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false
}

Response (non-streaming): a standard Chat Completion shape.


{
  "id": "chatcmpl-9f3a8b2c1d4e5f6a7b8c9d0e",
  "object": "chat.completion",
  "created": 1730491200,
  "model": "demo-bot",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

The id field is generated server-side as chatcmpl-<24 hex chars> to match the OpenAI shape. The only emitted finish_reason value is "stop" — tool calls are orchestrated internally by the Supervisor and are never surfaced to the API caller, so tool_calls, length, content_filter, etc. are not returned.

Caveat — usage is always zero. The field is included in the response envelope to keep clients that read it from crashing, but Polyant does not currently aggregate per-request token counts at this layer. Treat the values as placeholders.

Response (streaming, "stream": true): Server-Sent Events with data: lines containing JSON deltas, terminated by data: [DONE].

Authentication

GET /v1/models requires authentication — either a session JWT (admin panel users) or a per-instance API key via Authorization: Bearer sk-.... It is not an unauthenticated discovery endpoint; unauthenticated calls return HTTP 401.

POST /v1/chat/completions follows per-instance rules. If the instance has authEnabled = false, no header is required. Anyone who can reach the engine can talk to the instance — useful for local development, dangerous in production.

If the instance has authEnabled = true, every request must include:


Authorization: Bearer <auth-api-key>

The expected key is the value configured in the instance’s Settings tab under Auth API key (stored AES-256-GCM encrypted in instance_secrets.auth_api_key). The token from the Authorization: Bearer header is compared against the decrypted value using timingSafeEqual to prevent timing attacks (see packages/engine/src/server/openai/openai.controller.ts:191-201).

Errors

Errors are returned as the standard NestJS exception envelope — not the OpenAI {message, type, code} shape:


{
  "statusCode": 401,
  "message": "Invalid API key",
  "error": "Unauthorized"
}

HTTP	Cause
400	Request body fails Zod validation
401	`authEnabled=true` and missing/wrong Bearer token, or unknown `model`
429	Rate limit exceeded

Rate limit

/v1/chat/completions is throttled at 20 requests per minute per client IP, enforced by @Throttle({ default: { limit: 20, ttl: 60_000 } }) on the controller (packages/engine/src/server/openai/openai.controller.ts:63). Exceeding the budget returns HTTP 429. For higher volume, rate-limit upstream at the proxy (NGINX, Traefik, Cloudflare) and raise the engine ceiling accordingly.

Streaming

Streaming follows OpenAI’s SSE protocol exactly:


data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hel"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{"content":"lo"}}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

The first chunk carries role: "assistant"; subsequent chunks carry only content deltas; the last chunk carries finish_reason.

Worked example: cURL

Non-streaming:


curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-myinstancekey" \
  -d '{
    "model": "demo-bot",
    "messages": [{"role": "user", "content": "Summarize Polyant in one sentence."}]
  }'

Streaming:


curl -N http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-myinstancekey" \
  -d '{
    "model": "demo-bot",
    "stream": true,
    "messages": [{"role": "user", "content": "Tell me a story about an AI assistant."}]
  }'

Worked example: Python


from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="sk-myinstancekey",
)
 
resp = client.chat.completions.create(
    model="demo-bot",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Worked example: TypeScript


import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",
  apiKey: "sk-myinstancekey",
});
 
const resp = await client.chat.completions.create({
  model: "demo-bot",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);

What the API does not support

Function calling on the request side. Tools are configured per instance from the admin panel, not in the request payload.
Multiple system messages. The system prompt is composed server-side from the eight Prompts sections.
Image input via the API. Multimodal input arrives only through channel adapters today.
Embeddings or moderations. Those endpoints are not implemented.