Before tuning, establish baselines. Use exact KNN as ground truth, compare against approximate HNSW. Target >95% recall@K for production.
Use when: results are irrelevant or missing expected matches and you need to isolate the cause.
exact=true to bypass HNSW approximation Search API
Payload filtering and sparse vector search are different things. Metadata (dates, categories, tags) goes in payload for filtering. Text content goes in sparse vectors for search.
Use when: exact search returns good results but HNSW approximation misses them.
hnsw_ef at query time Search params
ef_construct (200+ for high quality) HNSW config
m (16 default, 32 for high recall) HNSW config
Binary quantization requires rescore. Without it, quality loss is severe. Use oversampling (3-5x minimum for binary) to recover recall. Always test quantization impact on your data before production. Quantization
Use when: exact search also returns bad results.
Test top 3 MTEB models on 100-1000 sample queries, measure recall@10. Domain-specific models often outperform general models. Hosted inference
Use when: exact search also returns bad results and model choice is confirmed by user.
Optimize search according to advanced search-strategies skill.
hnsw_ef lower than results requested (guaranteed bad recall)