Knowledge Base
The knowledge base is a per-instance corpus of documents an agent can consult during a turn. It is how you give a Polyant instance the things it should know — company policies, product datasheets, regulations, onboarding guides, runbooks — without baking them into the prompt. Unlike memory, which is auto-extracted from conversations, knowledge is content you (or the agent itself) put there deliberately.
Knowledge content lives in PostgreSQL — there is no filesystem source of truth. Content enters through two paths: admin-panel upload and the agent-driven writeKnowledge tool. Both go through the same pipeline: the engine chunks the content on sentence boundaries, embeds each chunk with OpenAI’s text-embedding-3-small, and persists everything in two tables: knowledge_documents (the parent record) and knowledge_chunks (the searchable units).
What goes in the knowledge base
The knowledge base is intentionally generic. Polyant does not impose a content shape — anything UTF-8 fits. Two broad uses emerge:
- Static reference docs — markdown notes, exported PDFs (after extraction), policy text, glossaries, product spec sheets. These are uploaded once via the admin panel or seed scripts and rarely change. Operators own this content.
- Dynamic agent-written facts — the agent can call the
writeKnowledgetool to persist things like “user prefers replies in Italian”, “user works at Acme Srl”, or short summaries of completed projects. This complements the automatic memory store: memory captures conversational facts as they pass through the supervisor; knowledge captures structured notes the agent commits intentionally, by filename.
The two channels are deliberately separate. Memory dedups by cosine similarity and is unpredictable about what survives an extraction pass. Knowledge is addressable by filename, mutable in place, and the agent can getKnowledge a specific document by name when it needs a verbatim copy.
Tools the agent sees
When knowledgeEnabled is true on the instance, three tools are wired into the supervisor:
searchKnowledge— query the corpus by natural language and get back the top-N relevant chunks with their source filename. This is the default retrieval path; the agent reaches for it whenever it needs information it does not already have in context.getKnowledge— fetch a document by exact filename. Used when the agent already knows which doc to read (e.g. an earliersearchKnowledgereturned"refund-policy.md").writeKnowledge— create, overwrite, or append a document by filename. After the write, a fire-and-forget pass re-chunks and re-embeds the document in the background. The document is immediately readable viagetKnowledge, but new chunks become searchable only after the reindex completes.
When knowledgeEnabled is false, none of these tools are registered for that instance — they do not appear in the supervisor’s tool list at all. The gate lives in buildTools() and skips every registered tool whose category === "knowledge".
Chunking, multilingual-aware
chunker.ts splits text on sentence boundaries with a target chunk size around 2000 characters (~500 tokens) and a 200-character overlap. The splitter is multilingual-aware: a curated set of abbreviations (Dr., Dott., Prof., Sig.ra, Ing., Avv., plus English equivalents) does not trigger a sentence break, so an Italian document like “Il Dr. Rossi ha confermato l’appuntamento” stays in one chunk rather than fragmenting at every honorific.
Each chunk gets its own row in knowledge_chunks with a vector(1536) embedding, a chunkIndex, and a back-pointer to its parent knowledge_documents row.
Retrieval
searchKnowledge today is a pure pgvector search: the query is embedded, then cosine similarity against the chunks table returns the top-N matches, joined back to their parent document’s filename. The retrieval helper lives in packages/engine/src/knowledge/search.ts.
For consistency across the retrieval surfaces, see Hybrid Search — that page describes the RRF + pgvector + FTS algorithm used by searchMemory. The knowledge surface uses the same embedding model (OpenAI text-embedding-3-small, 1536d) and the same simple Postgres FTS configuration when keyword search is enabled.
Lifecycle
Documents land in the same knowledge_documents table regardless of how they were created. The source column records which path each one came from:
- Upload (
source = "upload") — admin panel orPOST /api/instances/:slug/knowledge. The document is created withstatus = "uploading", then ingestion runs and flips it toprocessing→ready(orerror). This is the path the admin UI’s drag-and-drop uses, and the canonical way to ship a curated corpus into a new instance. - Agent-authored (
source = "agent") — thewriteKnowledgetool creates or appends documents under filenames chosen by the agent. A fire-and-forget reindex runs after every write. This is what the agent uses to persist user-volunteered facts that don’t fit the conversational shape of memory.
Other lifecycle events:
- Reindex — any
writeKnowledgeor admin-side edit reuses the same ingestion path: previous chunks are deleted, the new content is re-chunked and re-embedded. - Delete — removing a document cascades to all its chunks.
- Disable — flipping
knowledgeEnabled = falseon the instance leaves the data intact but hides the tools from the supervisor.
How it works
upload (admin panel | writeKnowledge tool)
|
v
chunker.ts (sentence boundaries, multilingual abbrev list)
|
v
embed-each-chunk
OpenAI text-embedding-3-small, 1536 dim
|
v
persist
knowledge_documents { id, instance_id, filename, raw_content, status }
knowledge_chunks { id, document_id, content, embedding, chunk_index }
|
v
retrieval (per turn, only if knowledgeEnabled=true)
|
+--------+----------------------------+
| |
v v
searchKnowledge(query) getKnowledge(filename)
embed query fetch raw_content
pgvector cosine search return document body
top-N chunks + filenamesCode reference
packages/engine/src/knowledge/schema.ts—knowledge_documents,knowledge_chunks,knowledge_document_statusenum.packages/engine/src/knowledge/chunker.ts— Sentence splitter with the multilingual abbreviation set.packages/engine/src/knowledge/ingestion.ts—processDocument(): chunk + embed + persist + status transitions.packages/engine/src/knowledge/store.ts— Document CRUD (upsertAgentDocument,appendAgentDocument,searchByVector).packages/engine/src/knowledge/search.ts—searchKnowledge()query path.packages/engine/src/agents/tools/search-knowledge.tool.ts— Tool wrapper consumed by the supervisor.packages/engine/src/agents/tools/get-knowledge.tool.ts— Filename-based fetch.packages/engine/src/agents/tools/write-knowledge.tool.ts— Write/append + fire-and-forget reindex.packages/engine/src/agents/supervisor/index.ts—buildTools()gates knowledge tools onknowledgeEnabled.
See also
- Memory — sister concept: auto-extracted conversational facts vs. agent-written knowledge.
- Hybrid Search — the retrieval algorithm shared with memory.
- Tools — registry and per-instance enablement.
- Knowledge admin UI — upload and manage knowledge documents.