← All episodes
Ep. 01·January 15, 2024·Cody Feda

Vector Database

Load-Bearing

A database that finds things by meaning, not by exact match

tl;dr

$ A vector database stores data as high-dimensional numerical arrays (vectors) that encode semantic meaning, enabling searches that return conceptually similar results rather than exact string matches - the foundational infrastructure behind most modern AI retrieval systems.

What it actually is

A vector database is a database optimized for storing and querying vectors - lists of floating-point numbers, typically ranging from a few hundred to a few thousand dimensions. Each number in that list represents some learned feature of the data it encodes.

When you run a photo through an image model or a sentence through a language model, the output is one of these vectors. Two photos of golden retrievers will produce vectors that are numerically close to each other. A photo of a golden retriever and a photo of a poodle will be slightly further apart. A photo of a golden retriever and a photo of a spreadsheet will be very far apart.

Vector databases exploit this geometry. Instead of asking "does this record equal this query?", they ask "which records are closest to this query?" - a fundamentally different operation called approximate nearest neighbor (ANN) search.

Why it matters for AI

Large language models don't have long memories. If you want a chatbot that knows about your company's internal documentation, you can't just paste 10,000 pages into the prompt. Instead, you:

  1. Convert all your documents into vectors (embeddings) and store them in a vector database
  2. When a user asks a question, convert that question into a vector
  3. Query the database for the most semantically similar documents
  4. Stuff those documents into the LLM prompt as context

This pattern - called Retrieval-Augmented Generation (RAG) - is how nearly every production AI assistant works under the hood. The vector database is the memory. Without it, the LLM is stateless.

How the search actually works

Most vector databases use an algorithm called HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index). These are approximate algorithms - they don't guarantee finding the mathematically closest vector, but they find something close enough, orders of magnitude faster than checking every record.

For a million vectors of 1,536 dimensions (OpenAI's ada-002 embedding size), a brute-force exact search would require roughly 6 billion floating-point operations per query. HNSW gets you to the answer in microseconds by building a layered graph structure that prunes the search space aggressively.

The trade-off is a tunable parameter called recall - at 95% recall, you're finding the true nearest neighbor 95% of the time. For most applications, this is more than acceptable.

Popular implementations

Pinecone is the managed cloud-native option - zero infrastructure, pay by usage. Weaviate and Qdrant are open-source and can run on your own hardware. pgvector is a PostgreSQL extension that adds vector search to a database you may already have. Chroma is popular for local development and experimentation.

The choice depends on scale. For a prototype or small application, pgvector in your existing Postgres instance is often the right answer. For millions of vectors with low-latency requirements, a dedicated system like Pinecone or Qdrant earns its complexity.

Load-Bearing verdict

Vector databases aren't a buzzword - they're doing real structural work. If you remove the vector database from a RAG system, the system stops functioning. The AI can no longer retrieve context. It has no memory. It becomes a stateless text transformer again. That's the definition of load-bearing: take it out and things fall down.

More episodes