ArrowSpace 谱向量检索技术

v20260626

arrowspace

一种基于图拉普拉斯特征的谱向量检索工具，结合语义相似度与结构角色，解决传统余弦或L2距离无法捕获嵌入空间潜在结构的问题。

向量检索谱分析图拉普拉斯嵌入向量数据科学 Python

239 次下载

概览

ArrowSpace

Spectral vector search that augments nearest-neighbour search with graph Laplacian features. Computes a Laplacian over the item graph and uses the Rayleigh quotient to produce a λτ (lambda-tau) score per item, enabling search that respects both semantic similarity and structural role.

When to Use This Skill

Cosine or L2 similarity misses latent structure in your embeddings
You want graph-based retrieval with spectral awareness
You need to characterise the spectral properties of an embedding space
You are building RAG pipelines where contextual role matters alongside semantic content

How It Works

Step 1: Install and import

pip install arrowspace

from arrowspace import ArrowSpaceBuilder
import numpy as np

Step 2: Prepare your data

Pass an (N, d) float64 NumPy array of embedding vectors:

items = np.array([[0.1, 0.2, 0.3],
                  [0.0, 0.5, 0.1],
                  [0.9, 0.1, 0.0]], dtype=np.float64)

Step 3: Configure graph parameters

graph_params = {"eps": 0.2, "k": 6, "topk": 3, "p": 2.0, "sigma": 1.0}
builder = ArrowSpaceBuilder(items, graph_params=graph_params)
aspace = builder.build()

Step 4: Query

lambdas = aspace.lambdas()           # array indexed by insertion order
sorted_res = aspace.lambdas_sorted()  # (score, index) pairs ascending

Higher λτ values indicate items that are both semantically close and structurally central.

Examples

Example 1: Basic spectral retrieval

items = np.random.randn(100, 64).astype(np.float64)
builder = ArrowSpaceBuilder(items, graph_params={"eps": 0.5, "k": 10, "topk": 5, "p": 2.0, "sigma": None})
aspace = builder.build()
scores = aspace.lambdas()
top_indices = np.argsort(scores)[-5:]

Example 2: Compare spectral vs cosine ranking

from sklearn.metrics.pairwise import cosine_similarity
cos_sim = cosine_similarity(items)
cosine_order = np.argsort(cos_sim[0])[::-1]
spectral_order = np.argsort(aspace.lambdas())[::-1]

Best Practices

✅ Normalise embeddings to unit norm before passing to ArrowSpace
✅ Start with eps proportional to 1/sqrt(dim) and tune from there
✅ Use k between 3 and 25 depending on dataset size (rule: N/50)
✅ Set sigma=None to auto-select kernel width from distance distribution
❌ Don't use with fewer than 10 items (graph structure is not meaningful)
❌ Don't use for real-time streaming data (ArrowSpace is batch-oriented)

Limitations

This skill does not replace environment-specific validation, testing, or expert review.
ArrowSpace is batch-oriented and not designed for real-time indexing of streaming data.

Common Pitfalls

Problem: eps is too small, producing a disconnected graph Solution: Increase eps, or set it proportional to 1/sqrt(embedding_dim)
Problem: k is too large, producing a dense graph with washed-out spectral features Solution: Keep k ≤ 25 for most datasets

Related Skills

vector-database-engineer — General vector database expertise
embedding-strategies — Embedding model selection and chunking
similarity-search-patterns — Semantic search implementation patterns
hybrid-search-implementation — Combined semantic + keyword search

信息

Category 数据科学

Name arrowspace

版本 v20260626

大小 3.79KB

Source sickn33/antigravity-awesome-skills

更新时间 2026-06-27