Memory

Memory is what makes a Polyant instance feel less like a stateless chatbot and more like an assistant that remembers you. Memory is separate from conversation history: facts extracted from a chat in February can resurface in an unrelated chat in May. Memory is per-instance, opt-in (memoryEnabled flag), and stored entirely in PostgreSQL — there is no external memory service, no vector cloud.

This page covers how memories are extracted, how they are embedded and stored, how they are retrieved via hybrid search, and the trade-offs baked into the design.

Extraction is automatic and conditional

After every supervisor response, the engine runs a fire-and-forget extraction pass:

Load the last fifteen messages of the current conversation.
Build a transcript and pass it to a fast-tier LLM call with a strict extraction prompt.
The LLM returns a JSON array of { content, category, importance } objects (or an empty array).
Each fact is embedded and upserted into the memories table.

Extraction only runs when the instance has memoryEnabled = true. If the flag is off, the entire extraction step is skipped.

The extraction prompt is deliberately strict:

Facts must be standalone third-person sentences (understandable without conversation context).
Relative dates (“tomorrow”, “next Monday”) must be converted to absolute dates based on today’s date, which is interpolated into the prompt at call time.
Facts are written in the same language as the conversation.
Greetings, filler, questions without answers, and assistant messages are explicitly excluded.

Because extraction is fire-and-forget, it never blocks the user-visible response. If extraction fails, the error is logged but the conversation continues normally.

The exact extraction prompt

This is the full system prompt sent to the fast-tier LLM on every extraction pass (verbatim from packages/engine/src/memory/extractor.ts, with ${dateStr} and ${dayName} filled in at call time):


You are a memory extraction system. Your task is to extract important facts, preferences, decisions, and events from a conversation.

TODAY'S DATE: {YYYY-MM-DD} ({DayName})

RULES:
- Extract ONLY concrete, factual information worth remembering long-term
- Each fact must be a standalone sentence (understandable without context)
- Write facts from a third-person perspective about "the user"
- CRITICAL: Always convert relative dates to absolute dates. "tomorrow" → "{YYYY-MM-DD}" + 1 day, "today" → "{YYYY-MM-DD}", "next Monday" → the actual date, etc. Every temporal reference must include the concrete date (e.g. "The user has a meeting on 2026-02-23 at 10:00")
- Write each fact in the SAME LANGUAGE as the conversation (if the user speaks Italian, write in Italian; if English, write in English)
- Categorize each fact: preference, fact, event, relationship, decision, general
- Rate importance 1-10 (10 = critical life fact, 1 = trivial)
- Do NOT extract: greetings, filler, questions without answers, assistant responses
- Do NOT extract facts that are purely about the current task/request being performed
- If nothing worth extracting, return an empty array
- Respond ONLY with a JSON array, no markdown fences, no explanation

OUTPUT FORMAT (strict JSON array):
[{"content": "...", "category": "...", "importance": N}]

The user message is the rendered conversation transcript — the last 15 messages joined with \n, each prefixed by User: or Assistant:. No history, no system context beyond the prompt above: the extractor is intentionally isolated from the supervisor’s working memory so the same conversation never gets two contradictory “memories” of itself.

Categories and importance

Each fact carries two pieces of metadata:

Category — one of six fixed labels: preference, fact, event, relationship, decision, general. The closed set is defined as the ExtractedFact["category"] union in packages/engine/src/memory/types.ts:25; the DB column itself (memories.category, see packages/engine/src/memory/schema.ts:19) is a free text field with default "general", so the constraint is enforced by the extractor LLM prompt and the TypeScript type — not by a Postgres enum. These labels drive UI grouping in the admin panel and can filter memory queries.
Importance — an integer from 1 to 10. 10 is a critical life fact (“the user has a peanut allergy”), 1 is trivial. Importance does not gate inclusion in retrieval today but is recorded for future scoring tweaks.

Storage: pgvector + FTS

The memories table holds the extracted facts:

instance_id — partition key. Memories never cross instances.
content — the fact, as written by the LLM.
category, importance — metadata.
embedding — vector(1536), generated by OpenAI’s text-embedding-3-small.
source_conversation_id — back-pointer for audit.
created_at, updated_at.

Conversation messages live in a separate table (conversation_messages) with a generated tsvector column for PostgreSQL full-text search. The FTS configuration is simple — no language-specific stopwords — so multilingual content searches cleanly.

Embeddings always use OpenAI. This is hard-coded in embedder.ts. Even when an instance is configured to use Anthropic or Bedrock as its LLM provider, the per-instance openai_api_key secret is required because Anthropic does not offer an embedding API and Bedrock’s Titan embeddings have not been wired in. The extraction LLM, however, uses the instance’s configured provider via the AI gateway.

Deduplication

Before inserting a new memory, the engine runs a cosine-similarity check against existing memories in the same instance. If a near-duplicate exists above DEDUP_SIMILARITY_THRESHOLD (configurable via config.memory.dedupSimilarityThreshold, default 0.90), the existing memory is updated in place — preserving the original created_at and updating only content and updated_at. Otherwise the new memory is inserted.

This keeps the memory store from accumulating slightly-rephrased duplicates after every conversation while still allowing genuine new facts through.

Retrieval via hybrid search

The supervisor’s searchMemory tool retrieves memories by fusing two backends — pgvector cosine similarity over the memories table (semantic), and PostgreSQL FTS over conversation_messages (keyword) — with Reciprocal Rank Fusion. The merge keeps the strengths of both: semantic recall when the user paraphrases, keyword precision for proper nouns and dates.

For the full algorithm (RRF formula, the k = 60 constant, FTS configuration, graceful degradation), see Hybrid Search.

The default 06-memory prompt tells the LLM to call searchMemory for any question that references past information — preferences, decisions, events, appointments — but to skip it for greetings and generic queries without historical context.

How it works


+----------------------------+
| Supervisor finishes turn   |
+--------------+-------------+
               |
               | (fire-and-forget, only if memoryEnabled)
               v
+----------------------------+        +-------------------------+
| extractMemories()          | -----> | LLM tier=fast            |
|   load last 15 messages    |        |   strict JSON output     |
|   build transcript         |        |   today's date injected  |
+--------------+-------------+        +------------+------------+
               |                                   |
               v                                   |
        facts: [{content, category, importance}]   |
               |                                   |
               v                                   |
+----------------------------+                     |
| generateEmbeddings()       | <-------------------+
|   OpenAI text-embed-3-small|
|   1536 dimensions          |
+--------------+-------------+
               |
               v
+----------------------------+
| upsertMemory()             |
|   cosine sim >= 0.90  -> update existing
|   otherwise           -> insert new
+--------------+-------------+
               |
               v
+----------------------------+
| memories table (pgvector)  |
+----------------------------+


Retrieval:
                       searchMemory(query)
                              |
                +-------------+--------------+
                v                            v
       pgvector cosine sim          PostgreSQL FTS
       over memories                over conversation_messages
       top 20                       top 20
                |                            |
                +-------------+--------------+
                              v
                     Reciprocal Rank Fusion
                     score = Σ 1 / (60 + rank + 1)
                              |
                              v
                       HybridSearchResult[]

Code reference

packages/engine/src/memory/schema.ts — memories table with vector(1536).
packages/engine/src/memory/extractor.ts — Extraction prompt (with absolute-date conversion) and tier: "fast" call.
packages/engine/src/memory/embedder.ts — Hard-coded text-embedding-3-small.
packages/engine/src/memory/memory-store.ts — Upsert with cosine-similarity dedup against DEDUP_SIMILARITY_THRESHOLD.
packages/engine/src/memory/hybrid-search.ts — RRF fusion with k = 60.
packages/engine/src/conversations/store.ts — FTS over conversation_messages (simple config).
packages/engine/src/agents/tools/search-memory.tool.ts — The supervisor-facing tool.
packages/engine/src/agents/tools/save-memory.tool.ts — Explicit user-driven save.