LangChain RAG: Vector Search and Optimization

v20260423

langchain-embeddings-search

Master the complexities of building robust Retrieval Augmented Generation (RAG) pipelines using LangChain. This guide addresses common pitfalls in vector search, such as score semantics differences (L2 vs. Cosine), embedding dimension mismatches, and chunking issues for code/Markdown. Learn best practices for selecting between FAISS, Pinecone, Chroma, and PGVector, normalizing scores, and implementing hybrid keyword+vector search for maximum retrieval quality.

LangChain RAG VectorStore Embeddings Python FAISS Pinecone Retrieval

Get Skill

350 downloads

Overview

LangChain Embeddings and Vector Search (Python)

Overview

FAISS.similarity_search_with_score() returns L2 distance — lower is better. Pinecone.similarity_search_with_score() returns cosine similarity — higher is better. Swap your vector store and your if score > 0.8 filter now keeps the garbage and drops the good results, silently. This is pain-catalog entry P12, and it is the single most common reason a "we migrated from FAISS to Pinecone for scale" project loses retrieval quality overnight.

The sibling gotchas:

P13 — RecursiveCharacterTextSplitter default separators break inside code fences, so RAG over Markdown docs truncates code examples mid-function
P14 — Embedding-dim mismatch crashes at insert time (after 10 minutes of processing), not at VectorStore.__init__; the failure blames "dim mismatch: 1536 != 3072" and no earlier error
P15 — Cohere/Jina reranker scores are within-query relative, so a 0.34 top-1 is not worse than a 0.92 top-1 on a different query; filtering by threshold is the wrong heuristic

This skill walks through embedding model selection, vector store creation with the version-safe dim guard, score normalization, hybrid keyword+vector search, and rerankers with the correct filter-by-rank pattern. Pin: langchain-core 1.0.x, langchain-community 1.0.x, langchain-openai 1.0.x, faiss-cpu, pinecone-client. Pain-catalog anchors: P12, P13, P14, P15, P49, P50.

Prerequisites

Python 3.10+
langchain-core >= 1.0, < 2.0 and langchain-community >= 1.0, < 2.0
Embedding provider: pip install langchain-openai (text-embedding-3-small/large)
Vector store: pip install faiss-cpu OR pip install langchain-pinecone
Provider API keys: OPENAI_API_KEY, PINECONE_API_KEY

Instructions

Step 1 — Initialize embeddings with an explicit dim

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dims
    # For text-embedding-3-large, use 3072 dims — must match index
)

# Assert dim at startup (prevents P14)
assert len(embeddings.embed_query("test")) == 1536, "embedding dim drifted"

Swapping models (-small 1536 → -large 3072) is a migration, not a swap. Plan it — back-fill the index, not just the config.

Step 2 — Choose a vector store

Store	Score metric	Latency (1M vectors)	When to use
`FAISS`	L2 distance (lower = better)	~5ms	Local dev, < 1M vectors, in-process
`Chroma`	Cosine similarity (higher = better)	~10ms	Small multi-user, persistent local
`PGVector`	Cosine by default (higher = better)	~20ms	Existing Postgres, transactional needs
`PineconeVectorStore`	Cosine similarity (higher = better)	~50ms (hosted)	> 1M vectors, multi-tenant, managed

from langchain_community.vectorstores import FAISS

store = FAISS.from_documents(docs, embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# FAISS: [(doc, 0.31), (doc, 0.42), ...] — LOWER IS MORE SIMILAR

vs.

from langchain_pinecone import PineconeVectorStore

store = PineconeVectorStore(index_name="prod", embedding=embeddings)
results = store.similarity_search_with_score("query", k=5)
# Pinecone: [(doc, 0.91), (doc, 0.87), ...] — HIGHER IS MORE SIMILAR

See Vector Store Comparison for the feature matrix and the migration gotchas.

Step 3 — Normalize scores before any threshold filter

Write a normalizer at the retriever boundary, so downstream code never sees raw store-specific scores:

def normalize(score: float, store_type: str) -> float:
    """Return similarity in [0, 1] where 1 = identical, 0 = unrelated."""
    if store_type == "faiss_l2":
        return 1.0 / (1.0 + score)  # collapse L2 distance into similarity
    if store_type in {"pinecone", "chroma", "pgvector"}:
        return max(0.0, min(1.0, score))  # already similarity, clamp just in case
    raise ValueError(f"Unknown store type: {store_type}")

Now score > 0.7 means the same thing regardless of backend. See Score Semantics for the per-store derivation.

Step 4 — Chunk text with language-aware splitters

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

# BAD — breaks inside Markdown code fences (P13)
bad = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

# GOOD — respects Markdown structure
md_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.MARKDOWN, chunk_size=1000, chunk_overlap=100,
)

# For Python source files
py_splitter = RecursiveCharacterTextSplitter.from_language(
    Language.PYTHON, chunk_size=1500, chunk_overlap=150,
)

PDF pipelines have their own pain: PyPDFLoader splits by page, tearing tables in half (P49). Use PyMuPDFLoader or UnstructuredPDFLoader for documents with tables.

Step 5 — Hybrid search (keyword + vector)

Pure vector search misses exact-match keywords (product SKUs, error codes, function names). Combine BM25 + vector:

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs); bm25.k = 5
vector = store.as_retriever(search_kwargs={"k": 5})

ensemble = EnsembleRetriever(
    retrievers=[bm25, vector],
    weights=[0.4, 0.6],  # tune on your eval set
)

See Hybrid Search for the eval harness and the weight-tuning procedure.

Step 6 — Rerank by rank, not by score

from langchain_cohere import CohereRerank

reranker = CohereRerank(top_n=3, model="rerank-v3.5")
reranked = reranker.compress_documents(
    documents=candidates, query=query,
)
# reranked[0].metadata["relevance_score"] is query-relative — 0.34 may be the best
# WRONG: [d for d in reranked if d.metadata["relevance_score"] > 0.5]
# RIGHT: reranked[:top_n]  — trust the rank order

Filter by rank (keep top-k) not threshold. Calibration per-query is possible but rarely worth the engineering cost.

Output

Embeddings initialized with dim assertion at startup
Vector store chosen from the comparison matrix with score-semantics awareness
Score normalizer applied at retriever boundary (no raw scores downstream)
Language-aware text splitter that respects code fences and PDF structure
Hybrid retriever combining BM25 and vector with tuned weights
Reranker filtering by rank, not threshold

Error Handling

Error	Cause	Fix
`PineconeApiException: dim mismatch: 1536 != 3072`	Changed embedding model without reindexing (P14)	Create a new index with the new dim; migrate in a background job
Retrieval quality drops after FAISS→Pinecone swap	Score semantics flipped (P12)	Apply `normalize()` at boundary; retune threshold on eval set
RAG answers misquote tables	`PyPDFLoader` tore table across pages (P49)	Switch to `PyMuPDFLoader` or `UnstructuredPDFLoader`
RAG retrieval drops code examples mid-function	`RecursiveCharacterTextSplitter` broke code fence (P13)	Use `from_language(Language.MARKDOWN/PYTHON)`
Cohere reranker top-1 score < 0.5	Scores are per-query relative (P15)	Filter by rank (`reranked[:k]`), not threshold
`WebBaseLoader` returns 403 / Cloudflare interstitial (P50)	Default User-Agent flagged as bot	Pass `header_template={"User-Agent": "Mozilla/5.0 ..."}`; respect robots.txt
`ValueError: expected str instance, NoneType found` on embed	Empty document content	Filter `docs = [d for d in docs if d.page_content.strip()]` before embedding

Examples

Building a RAG retriever with hybrid search

End-to-end: load Markdown docs with language-aware chunking, embed with OpenAI text-embedding-3-small, index in FAISS for local dev, wrap in an EnsembleRetriever with BM25 at 0.4 weight and vector at 0.6.

See Hybrid Search for the full builder and the weight-tuning procedure on a golden set.

Migrating from FAISS to Pinecone without quality regression

The three gotchas: (a) score semantics flip (P12), (b) the migration needs a re-embed unless the source embedding is stable, (c) threshold filters must be retuned on the new score scale.

See Vector Store Comparison for the migration checklist.

Per-tenant vector stores without leakage

Use Pinecone namespaces or PGVector row-level security. Construct the retriever per-request with the tenant ID — never bind a retriever at import time (P33).

See the pack's langchain-enterprise-rbac skill for the tenant-isolation pattern.

Resources

LangChain Python: Vector stores
LangChain Python: Retrievers
LangChain: Text splitters
FAISS docs (score is L2 distance)
Pinecone metrics (cosine default)
Cohere Rerank (score per-query relative)
Pack pain catalog: docs/pain-catalog.md (entries P12, P13, P14, P15, P49, P50)

Info

Category Artificial Intelligence

Name langchain-embeddings-search

Version v20260423

Size 12.74KB

Source jeremylongshore/claude-code-plugins-plus-skills

Updated At 2026-04-28