技能 编程开发 查询结果量扩展优化

查询结果量扩展优化

v20260420
qdrant-scaling-query-volume
当查询需要从多个分片(shards)获取大量结果时,该机制用于优化数据传输效率。它不会让每个分片都返回完整的查询限制,而是基于泊松分布统计计算出较小的、优化的限制值,然后进行数据合并。这极大地减少了分片间的数据传输量,从而提高大规模向量搜索的性能和稳定性,同时确保了结果的高精度。
获取技能
200 次下载
概览

Scaling for Query Volume

Problem: When a query has a large limit (e.g. 1000) and there are multiple shards (e.g. 10), naively each shard must return the full 1000 results — totaling 10,000 scored points transferred and merged. This is wasteful since data is randomly distributed across auto-shards.

Core idea

Instead of asking every shard for the full limit, ask each shard for a smaller limit computed via Poisson distribution statistics, then merge. This is safe because auto-sharding guarantees random, independent data distribution.

When it activates

  • More than 1 shard
  • Auto-sharding is in use (all queried shards share the same shard key)
  • The request's limit + offset >= SHARD_QUERY_SUBSAMPLING_LIMIT (128)
  • The query is not exact

Key tradeoff

The strategy trades a small probability of slightly incomplete results for a large reduction in inter-shard data transfer, especially for high-limit queries across many shards. The 1.2x safety factor and the 99.9% Poisson threshold keep the error rate very low — comparable to inaccuracies already introduced by approximate vector indices like HNSW.

信息
Category 编程开发
Name qdrant-scaling-query-volume
版本 v20260420
大小 1.37KB
更新时间 2026-04-28
语言