Hybrid Search
Hybrid search is the retrieval algorithm Polyant uses when a tool needs to find information that may live anywhere — across extracted memories, raw conversation history, or knowledge base chunks. It runs two retrievers in parallel and fuses their rankings with Reciprocal Rank Fusion (RRF). The result outperforms either retriever alone because the two are complementary: one catches meaning, the other catches exact strings.
This page covers the two backends, the RRF fusion formula, why Postgres FTS is configured with simple, and the trade-offs of the design.
Two retrievers, one ranking
The two backends answer fundamentally different questions:
- pgvector cosine similarity — semantic. “What is conceptually close to this query?” Embeddings (OpenAI
text-embedding-3-small, 1536d) map “the user’s working hours” near “opening times” near “shift schedule”. This catches paraphrase, synonymy, multilingual restatements. - PostgreSQL Full-Text Search — lexical. “What contains these literal tokens?”
ts_rank()over a generatedtsvectorcatches exact identifiers (“invoice 12345”), product codes, surnames, and rare words that embeddings smear together because they are out-of-distribution for the embedding model.
Either alone misses obvious results. Vector search loses identifiers (an embedding cannot distinguish “invoice 12345” from “invoice 12346”). Keyword search loses semantic matches (a query about “hours” returns nothing if the doc only says “opening times”). Combined, both classes of query work.
The RRF fusion formula
For each document d that appears in either ranked list, its fused score is:
score(d) = Σ over backends b such that d appears in b:
1 / (k + rank_in_b(d) + 1)with k = 60 (the conventional RRF constant). A document ranked first in a list contributes 1 / 61 ≈ 0.0164; rank 10 contributes 1 / 71 ≈ 0.0141; documents missing from a list contribute 0 for that backend. Documents present in both backends sum their two contributions, so cross-backend agreement is rewarded.
RRF is rank-based, not score-based. It does not try to compare a cosine similarity (in [0, 1]) to a ts_rank (unbounded, depends on document length and tsvector density). It just compares positions. That makes the fusion robust without per-backend score calibration.
k = 60 is the value Cormack, Clarke and Buettcher recommended in the original RRF paper and the one Polyant uses unchanged in packages/engine/src/memory/hybrid-search.ts.
Backend fetch sizes and top-K
The current memory implementation:
- Each backend is asked for
max(limit * 2, 20)candidates. With the defaultlimit = 10that yields 20 per side. - After fusion the top
limit(default 10) are returned.
The over-fetch matters: the union of both lists is larger than either, and the documents that appear in both backends (the strongest matches) need both ranks available to sum. Asking each backend for only limit candidates would drop those cross-backend hits below the cut-off before fusion.
simple FTS configuration is deliberate
PostgreSQL’s FTS configuration is set to 'simple', not 'english' or 'italian'. The simple configuration does no stemming and applies no language-specific stopword list. The trade-offs are explicit:
| aspect | simple (current) | language-specific (e.g. english) |
|---|---|---|
| stemming | none — “running” ≠ “ran” | yes — “running” matches “ran” |
| stopwords | none — “the” stays in the index | dropped — “the” is removed |
| multilingual | works the same for any language | wrong for any language other than the configured one |
| identifiers | preserved as-is (good for ticket IDs) | sometimes mangled |
Polyant instances span Italian, English, German, and mixed-language content (a Slack channel where users code-switch mid-sentence). A language-specific config would silently break searches in every language except the one chosen. simple is worse than perfect English search, but it is consistently correct everywhere — and the vector backend already provides semantic recall, including across morphological variants.
What hybrid search powers today
The same hybrid algorithm is used by the supervisor’s memory tool:
searchMemory— fuses pgvector over thememoriestable with FTS overconversation_messages. Memories are LLM-extracted facts; conversation messages are raw turns. Combined, the retriever catches both “what the user told me” (memory) and “the exact phrase they used” (FTS).
The knowledge surface (searchKnowledge) currently performs a pure pgvector search over knowledge_chunks. See Knowledge Base for the full retrieval description.
How it works
query
|
+---------------------------+---------------------------------+
| |
v v
embed via generateEmbedding() websearch_to_tsquery('simple', query)
OpenAI text-embedding-3-small over a generated tsvector column
| |
v v
pgvector cosine search ts_rank() FTS search
searchByVector(emb, instanceId, fetchLimit) conversationStore.searchByKeyword(...)
top-N candidates (semantic) top-N candidates (lexical)
| |
+---------------------------+---------------------------------+
|
v
Reciprocal Rank Fusion
k = 60
for each result r at position i in backend b:
score(r) += 1 / (k + i + 1)
|
v
sort desc by fused score, slice(0, limit)
|
v
HybridSearchResult[]
{ content, type, score, source, createdAt }Failure modes and graceful degradation
Each backend is wrapped in its own try / catch inside hybridSearch(). If pgvector raises (e.g. the embedding call fails because the per-instance openai_api_key is missing), the keyword backend’s results are still returned. Same in reverse — if FTS fails, vector results still come back. Hybrid search “degrades to single retriever” rather than failing the whole tool call.
If both backends fail, the function returns an empty array. The supervisor sees no results and either tells the user it does not know, or attempts another tool.
Code reference
packages/engine/src/memory/hybrid-search.ts—hybridSearch(), RRF withk = 60, parallel backend calls.packages/engine/src/memory/memory-store.ts—searchByVector(), pgvector cosine query.packages/engine/src/memory/embedder.ts—generateEmbedding(), OpenAItext-embedding-3-small.packages/engine/src/conversations/store.ts—searchByKeyword(), FTS withsimpleconfig.packages/engine/src/conversations/schema.ts— Generatedtsvectorcolumn onconversation_messages.packages/engine/src/agents/tools/search-memory.tool.ts— Supervisor-facing wrapper.packages/engine/src/knowledge/search.ts— Pure pgvector path used bysearchKnowledge.
See also
- Memory — extraction, storage, and the
searchMemoryconsumer. - Knowledge Base — the other retrieval surface.
- AI Gateway — embeddings always go through OpenAI.
- Architecture — where retrieval sits in the request pipeline.