Hybrid Search

Hybrid search is the retrieval algorithm Polyant uses when a tool needs to find information that may live anywhere — across extracted memories, raw conversation history, or knowledge base chunks. It runs two retrievers in parallel and fuses their rankings with Reciprocal Rank Fusion (RRF). The result outperforms either retriever alone because the two are complementary: one catches meaning, the other catches exact strings.

This page covers the two backends, the RRF fusion formula, why Postgres FTS is configured with simple, and the trade-offs of the design.

Two retrievers, one ranking

The two backends answer fundamentally different questions:

pgvector cosine similarity — semantic. “What is conceptually close to this query?” Embeddings (OpenAI text-embedding-3-small, 1536d) map “the user’s working hours” near “opening times” near “shift schedule”. This catches paraphrase, synonymy, multilingual restatements.
PostgreSQL Full-Text Search — lexical. “What contains these literal tokens?” ts_rank() over a generated tsvector catches exact identifiers (“invoice 12345”), product codes, surnames, and rare words that embeddings smear together because they are out-of-distribution for the embedding model.

Either alone misses obvious results. Vector search loses identifiers (an embedding cannot distinguish “invoice 12345” from “invoice 12346”). Keyword search loses semantic matches (a query about “hours” returns nothing if the doc only says “opening times”). Combined, both classes of query work.

The RRF fusion formula

For each document d that appears in either ranked list, its fused score is:


score(d) = Σ over backends b such that d appears in b:
              1 / (k + rank_in_b(d) + 1)

with k = 60 (the conventional RRF constant). A document ranked first in a list contributes 1 / 61 ≈ 0.0164; rank 10 contributes 1 / 71 ≈ 0.0141; documents missing from a list contribute 0 for that backend. Documents present in both backends sum their two contributions, so cross-backend agreement is rewarded.

RRF is rank-based, not score-based. It does not try to compare a cosine similarity (in [0, 1]) to a ts_rank (unbounded, depends on document length and tsvector density). It just compares positions. That makes the fusion robust without per-backend score calibration.

k = 60 is the value Cormack, Clarke and Buettcher recommended in the original RRF paper and the one Polyant uses unchanged in packages/engine/src/memory/hybrid-search.ts.

Backend fetch sizes and top-K

The current memory implementation:

Each backend is asked for max(limit * 2, 20) candidates. With the default limit = 10 that yields 20 per side.
After fusion the top limit (default 10) are returned.

The over-fetch matters: the union of both lists is larger than either, and the documents that appear in both backends (the strongest matches) need both ranks available to sum. Asking each backend for only limit candidates would drop those cross-backend hits below the cut-off before fusion.

`simple` FTS configuration is deliberate

PostgreSQL’s FTS configuration is set to 'simple', not 'english' or 'italian'. The simple configuration does no stemming and applies no language-specific stopword list. The trade-offs are explicit:

aspect	`simple` (current)	language-specific (e.g. `english`)
stemming	none — “running” ≠ “ran”	yes — “running” matches “ran”
stopwords	none — “the” stays in the index	dropped — “the” is removed
multilingual	works the same for any language	wrong for any language other than the configured one
identifiers	preserved as-is (good for ticket IDs)	sometimes mangled

Polyant instances span Italian, English, German, and mixed-language content (a Slack channel where users code-switch mid-sentence). A language-specific config would silently break searches in every language except the one chosen. simple is worse than perfect English search, but it is consistently correct everywhere — and the vector backend already provides semantic recall, including across morphological variants.

What hybrid search powers today

The same hybrid algorithm is used by the supervisor’s memory tool:

searchMemory — fuses pgvector over the memories table with FTS over conversation_messages. Memories are LLM-extracted facts; conversation messages are raw turns. Combined, the retriever catches both “what the user told me” (memory) and “the exact phrase they used” (FTS).

The knowledge surface (searchKnowledge) currently performs a pure pgvector search over knowledge_chunks. See Knowledge Base for the full retrieval description.

How it works


query
  |
  +---------------------------+---------------------------------+
  |                                                             |
  v                                                             v
  embed via generateEmbedding()                  websearch_to_tsquery('simple', query)
  OpenAI text-embedding-3-small                  over a generated tsvector column
  |                                                             |
  v                                                             v
  pgvector cosine search                         ts_rank() FTS search
  searchByVector(emb, instanceId, fetchLimit)    conversationStore.searchByKeyword(...)
  top-N candidates (semantic)                    top-N candidates (lexical)
  |                                                             |
  +---------------------------+---------------------------------+
                              |
                              v
                     Reciprocal Rank Fusion
                     k = 60
                     for each result r at position i in backend b:
                         score(r) += 1 / (k + i + 1)
                              |
                              v
                  sort desc by fused score, slice(0, limit)
                              |
                              v
                     HybridSearchResult[]
                       { content, type, score, source, createdAt }

Failure modes and graceful degradation

Each backend is wrapped in its own try / catch inside hybridSearch(). If pgvector raises (e.g. the embedding call fails because the per-instance openai_api_key is missing), the keyword backend’s results are still returned. Same in reverse — if FTS fails, vector results still come back. Hybrid search “degrades to single retriever” rather than failing the whole tool call.

If both backends fail, the function returns an empty array. The supervisor sees no results and either tells the user it does not know, or attempts another tool.

Code reference

packages/engine/src/memory/hybrid-search.ts — hybridSearch(), RRF with k = 60, parallel backend calls.
packages/engine/src/memory/memory-store.ts — searchByVector(), pgvector cosine query.
packages/engine/src/memory/embedder.ts — generateEmbedding(), OpenAI text-embedding-3-small.
packages/engine/src/conversations/store.ts — searchByKeyword(), FTS with simple config.
packages/engine/src/conversations/schema.ts — Generated tsvector column on conversation_messages.
packages/engine/src/agents/tools/search-memory.tool.ts — Supervisor-facing wrapper.
packages/engine/src/knowledge/search.ts — Pure pgvector path used by searchKnowledge.