How Semantic Search Powers AI Memory: Vector Embeddings Explained

The Problem with Keyword Search

Traditional search systems match exact words. If you search for "authentication decision" in a keyword-based system, it looks for documents containing those exact terms. But what if your team discussed "login flow architecture" or "session management approach" or "auth module refactor"? A keyword search misses all of these — even though they are semantically about the same topic.

This is a critical limitation when building memory systems for AI coding tools. Developers do not use consistent terminology. The same concept gets described differently across sessions, tools, and contexts. A memory system that relies on keyword matching will miss the majority of relevant past decisions.

Semantic search solves this by understanding meaning, not just matching words.

What Are Vector Embeddings?

A vector embedding is a numerical representation of text in high-dimensional space. Instead of storing text as a string of characters, an embedding model converts it into an array of floating-point numbers — typically 768 or 1536 dimensions.

Each dimension captures some aspect of the text's meaning. While individual dimensions are not human-interpretable, the overall vector encodes rich semantic information: topic, sentiment, technical domain, abstraction level, and relationships to other concepts.

For example, the text "We decided to use PostgreSQL for the user database because we need complex relational queries" gets converted into a 768-dimensional vector. The text "The team chose Postgres as the primary data store due to requirements for JOIN-heavy operations" produces a different vector — but one that is very close in the 768-dimensional space, because the meaning is similar.

Modern embedding models like OpenAI's text-embedding-3-small and Google's text-embedding-004 are trained on massive text corpora to produce vectors where semantic similarity maps to geometric proximity. Two texts about the same topic will have vectors pointing in roughly the same direction, regardless of the specific words used.

How Cosine Similarity Works

Once you have vector embeddings, you need a way to measure how similar two vectors are. The standard metric is cosine similarity — a measure of the angle between two vectors in high-dimensional space.

Cosine similarity ranges from -1 (exactly opposite meaning) to 1 (exactly same meaning), with 0 indicating no relationship. In practice, most embedding models produce values between 0 and 1 for natural language text.

The formula is straightforward: cosine similarity equals the dot product of two vectors divided by the product of their magnitudes. What makes this powerful is that it measures directional similarity, not magnitude. A short sentence and a long paragraph about the same topic will have high cosine similarity even though their vectors have different magnitudes.

For AI memory systems, a cosine similarity threshold of 0.7 to 0.8 typically indicates strong semantic relevance. This means you can search your context history with a natural language query and find relevant past decisions even when the original discussion used completely different terminology.

Keyword Search vs Semantic Search: A Real Example

Consider a developer searching their context history for past decisions about error handling. Here is how keyword search and semantic search compare:

Query: "How should we handle errors in the API?"

Keyword search looks for documents containing "handle," "errors," and "API." It finds:

A document titled "API Error Handling Standards" (exact match)
A commit message mentioning "handle validation errors in API endpoint"

It misses:

A discussion about "fault tolerance strategy for backend services"
A decision to "use Result types instead of throwing exceptions"
An architecture review noting "the retry mechanism for failed upstream calls"

Semantic search converts the query into a vector and finds the closest vectors in the database. It returns all five documents above, ranked by relevance — because all five are semantically about error handling in backend systems, even though they use different words.

Query: "Why did we choose this database?"

Keyword search finds almost nothing useful — "choose," "this," and "database" are either too common or too vague for keyword matching.

Semantic search understands the intent — you are looking for a past decision about database selection — and returns the context snapshots from the session where your team evaluated PostgreSQL vs MongoDB vs DynamoDB, even though those discussions never used the phrase "choose this database."

pgvector: Vector Search at Scale

Storing and searching millions of vectors efficiently requires specialized database infrastructure. pgvector is a PostgreSQL extension that adds vector data types and similarity search operators directly to Postgres.

Why pgvector matters for AI memory systems:

Native PostgreSQL integration: If your application already uses PostgreSQL (and most do), pgvector lets you store vectors alongside your relational data. No separate vector database needed. Context snapshots can live in the same database as user accounts, workspace configurations, and activity logs.

HNSW indexing: pgvector supports Hierarchical Navigable Small World (HNSW) indexes — a graph-based approximate nearest neighbor algorithm that provides sub-millisecond query times even on millions of vectors. An HNSW index on 768-dimensional vectors with 100,000 entries typically returns results in under 5 milliseconds.

Exact and approximate search: For smaller datasets, pgvector supports exact nearest neighbor search (brute-force comparison against all vectors). For larger datasets, HNSW indexes trade a small amount of accuracy for dramatic speed improvements — typically 95-99% recall at 100x faster query times.

Filtering with vector search: Because pgvector lives inside PostgreSQL, you can combine vector similarity search with standard SQL filtering. Search for "authentication decisions" but only within a specific workspace, after a certain date, or tagged with a specific project. This hybrid query capability is essential for multi-tenant AI memory systems.

How Swylink Uses Semantic Search

Swylink's context intelligence layer uses 768-dimensional vector embeddings to power its memory search. Here is the pipeline:

1. Context capture: When an AI tool saves a context snapshot through Swylink's MCP tools, the snapshot includes a summary, key decisions, topic tags, and files changed. This structured text gets concatenated into a single document.

2. Embedding generation: The document is passed through an embedding model that produces a 768-dimensional vector. This vector is stored in PostgreSQL alongside the structured metadata using pgvector.

3. Search: When any AI tool needs past context — triggered by the user asking a question or the AI proactively seeking background — the search query gets embedded into the same 768-dimensional space. pgvector performs a cosine similarity search against all stored vectors in that workspace, returning the most semantically relevant context snapshots.

4. Re-ranking: The top results are further ranked by recency, decision importance, and tag relevance to produce a final set of context snapshots that get injected into the AI's current conversation.

This entire pipeline executes in under 100 milliseconds for typical workspaces, meaning context search adds negligible latency to AI interactions.

The Mathematics of Meaningful Memory

What makes semantic search transformative for AI memory is that it mirrors how humans actually recall information. You do not remember past decisions by the exact words used — you remember them by meaning, association, and relevance to your current situation.

A 768-dimensional embedding space is rich enough to capture nuanced semantic relationships. Vectors for "microservices vs monolith" and "service architecture decision" will be close together. Vectors for "PostgreSQL index optimization" and "database performance tuning" will cluster nearby. The geometry of the embedding space naturally organizes knowledge by topic and relationship.

This is why semantic search is the foundation of effective AI memory. When your AI tool asks "what context do we have about the payment system?" it is not searching for the word "payment" — it is finding every past decision, discussion, and architectural choice that is semantically related to payment processing. That includes decisions about Stripe integration, transaction retry logic, idempotency keys, and webhook handling — even if none of those conversations ever used the word "payment."

The combination of high-dimensional vector embeddings, cosine similarity, and pgvector creates a memory system that actually understands what you meant, not just what you typed. For AI coding tools, this is the difference between a forgetful assistant and a knowledgeable partner.