Skills Artificial Intelligence LangChain RAG: Vector Search and Optimization

LangChain RAG: Vector Search and Optimization

v20260423
langchain-embeddings-search
Master the complexities of building robust Retrieval Augmented Generation (RAG) pipelines using LangChain. This guide addresses common pitfalls in vector search, such as score semantics differences (L2 vs. Cosine), embedding dimension mismatches, and chunking issues for code/Markdown. Learn best practices for selecting between FAISS, Pinecone, Chroma, and PGVector, normalizing scores, and implementing hybrid keyword+vector search for maximum retrieval quality.
Get Skill
350 downloads
Overview

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similarity_search_with_score() returns L2 distance — lower is better. Pinecone.similarity_search_with_score() returns cosine similarity — higher is better. Swap your vector store and your if score > 0.8 filter now keeps the garbage and drops the good results, silently. This is pain-catalog entry P12, and it is the single most common reason a "we migrated from FAISS to Pinecone for scale" project loses retrieval quality overnight.

The sibling gotchas:

  • P13 — RecursiveCharacterTextSplitter default separators break inside code fences, so RAG over Markdown docs truncates code examples mid-function
  • P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of processing), not at VectorStore.__init__; the failure blames "dim mismatch: 1536 != 3072" and no earlier error
  • P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34 top-1 is not worse than a 0.92 top-1 on a different query; filtering by threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with the version-safe dim guard, score normalization, hybrid keyword+vector search, and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x, langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client. Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

  • Python 3.10+
  • langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
  • Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
  • Vector store: pip install faiss-cpu OR pip install langchain-pinecone
  • Provider API keys: OPENAI_API_KEY, PINECONE_API_KEY

Instructions

Step 1 — Initialize embeddings with an explicit dim

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap. Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

Store Score metric Latency (1M vectors) When to use
FAISS L2 distance (lower = better) ~5ms Local dev, < 1M vectors, in-process
Chroma Cosine similarity (higher = better) ~10ms Small multi-user, persistent local
PGVector Cosine by default (higher = better) ~20ms Existing Postgres, transactional needs
PineconeVectorStore Cosine similarity (higher = better) ~50ms (hosted) > 1M vectors, multi-tenant, managed
from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.

from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:

def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the weight-tuning procedure.

Step 6 — Rerank by rank, not by score

from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.

Output

  • Embeddings initialized with dim assertion at startup
  • Vector store chosen from the comparison matrix with score-semantics awareness
  • Score normalizer applied at retriever boundary (no raw scores downstream)
  • Language-aware text splitter that respects code fences and PDF structure
  • Hybrid retriever combining BM25 and vector with tuned weights
  • Reranker filtering by rank, not threshold

Error Handling

Error Cause Fix
PineconeApiException: dim mismatch: 1536 != 3072 Changed embedding model without reindexing (P14) Create a new index with the new dim; migrate in a background job
Retrieval quality drops after FAISS→Pinecone swap Score semantics flipped (P12) Apply normalize() at boundary; retune threshold on eval set
RAG answers misquote tables PyPDFLoader tore table across pages (P49) Switch to PyMuPDFLoader or UnstructuredPDFLoader
RAG retrieval drops code examples mid-function RecursiveCharacterTextSplitter broke code fence (P13) Use from_language(Language.MARKDOWN/PYTHON)
Cohere reranker top-1 score < 0.5 Scores are per-query relative (P15) Filter by rank (reranked[:k]), not threshold
WebBaseLoader returns 403 / Cloudflare interstitial (P50) Default User-Agent flagged as bot Pass header_template={"User-Agent": "Mozilla/5.0 ..."}; respect robots.txt
ValueError: expected str instance, NoneType found on embed Empty document content Filter docs = [d for d in docs if d.page_content.strip()] before embedding

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI text-embedding-3-small, index in FAISS for local dev, wrap in an EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.

See Vector Store Comparison for the migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources

Info
Name langchain-embeddings-search
Version v20260423
Size 12.74KB
Updated At 2026-04-28
Language