技能 数据科学 Weaviate向量数据库操作

Weaviate向量数据库操作

v20260703
weaviate
本技能提供对Weaviate向量数据库的全面访问能力。用户可以使用它执行语义搜索、混合关键词查询、模式检查、数据探索、集合创建和数据导入(CSV、JSON、PDF)。适用于需要根据概念查询数据、检索特定原始对象或管理Weaviate集合生命周期的场景。
获取技能
299 次下载
概览

Weaviate Database Operations

This skill provides comprehensive access to Weaviate vector databases including search operations, natural language queries, schema inspection, data exploration, filtered fetching, collection creation, and data imports.

When to Use This Skill

  • Use when the user needs to inspect Weaviate collections, schemas, or data distribution.
  • Use when running semantic, hybrid, keyword, filtered, or Query Agent searches against Weaviate.
  • Use when importing CSV, JSON, JSONL, or PDF data into a Weaviate collection.
  • Use when creating example data or a collection for a Weaviate-backed workflow.

Weaviate Cloud Instance

If the user does not have an instance yet, direct them to the cloud console to register and create a free sandbox. Create a Weaviate instance via Weaviate Cloud.

Environment Variables

Required:

  • WEAVIATE_URL - Your Weaviate Cloud cluster URL
  • WEAVIATE_API_KEY - Your Weaviate API key

External Provider Keys (auto-detected): Set only the keys your collections use, refer to Environment Requirements for more information.

Script Index

Search & Query

  • Query Agent - Ask Mode: Use when the user wants a direct answer to a question based on collection data. The Query Agent synthesizes information from one or more collections and returns a structured response with source citations (collection name and object ID).
  • Query Agent - Search Mode: Use when the user wants to explore or browse raw objects across one or more collections. Unlike ask mode, this returns the actual data objects rather than a synthesized answer.
  • Hybrid Search: Default choice for most searches. Provides a good balance of semantic understanding and exact keyword matching. Use this when you are unsure which search type to pick.
  • Semantic Search: Use for finding conceptually similar content regardless of exact wording. Best when the intent matters more than specific keywords.
  • Keyword Search: Use for finding exact terms, IDs, SKUs, or specific text patterns. Best when precise keyword matching is needed rather than semantic similarity.

Collection Management

  • List Collections: Use to discover what collections exist in the Weaviate instance. This should typically be the first step before performing any search or data operation.
  • Get Collection Details: Use to understand a collection's schema — its properties, data types, vectorizer configuration, replication factor, and multi-tenancy status. Helpful before running searches or imports.
  • Explore Collection: Use to analyze data distribution, top values, and inspect actual content in a collection. Helpful for understanding what data looks like before querying.
  • Create Collection: Use to create new collections with custom schemas before importing data. Do not specify a vectorizer unless the user explicitly requests one (the default text2vec_weaviate is used).

Data Operations

  • Fetch and Filter: Use to retrieve specific objects by ID or strictly filtered subsets of data. Best for precise data retrieval rather than search.
  • Import Data: Use this when the user asks to import, load, or ingest a file (CSV, JSON, JSONL, PDF) into a collection.
  • Create Example Data: Use to create example data for immediate use of other skills, if no data is available or user requests some toy data.

Recommendations

  1. Start by listing collections if you don't know what's available:

    uv run scripts/list_collections.py
    
  2. Ask the user if they want to create example data if nothing is available and the user requests it. Otherwise continue.

    uv run scripts/example_data.py
    
  3. Get collection details to understand the schema:

    uv run scripts/get_collection.py --name "COLLECTION_NAME"
    
  4. Explore collection data to see values and statistics:

    uv run scripts/explore_collection.py "COLLECTION_NAME"
    
  5. Create a collection if importing a new CSV, JSON, or JSONL file — the collection must exist before importing:

    uv run scripts/create_collection.py CollectionName \
      --properties '[{"name": "title", "data_type": "text"}, {"name": "body", "data_type": "text"}]'
    

    Do not specify a vectorizer unless the user explicitly requests one.

  6. Import data into an existing collection:

    uv run scripts/import.py "data.csv" --collection "CollectionName"
    

    For PDF imports, the collection is created automatically — skip step 5.

  7. Choose the right search type:

    • Get AI-powered answers with source citations across multiple collections → ask.py
    • Get raw objects from multiple collections → query_search.py
    • General search → hybrid_search.py (default)
    • Conceptual similarity → semantic_search.py
    • Exact terms/IDs → keyword_search.py

Output Formats

All scripts support:

  • Markdown tables (default and recommended)
  • JSON (--json flag)

Error Handling

Common errors:

  • WEAVIATE_URL not set → Set the environment variable
  • Collection not found → Use list_collections.py to see available collections
  • Authentication error → Check API keys for both Weaviate and vectorizer providers

Limitations

  • This skill requires a reachable Weaviate instance and valid credentials before live operations can succeed.
  • Data import, collection creation, and query-agent operations can change or expose user data; confirm the target instance and collection before running scripts.
  • The included scripts are Weaviate-focused and do not replace broader data-governance, backup, or production migration procedures.
信息
Category 数据科学
Name weaviate
版本 v20260703
大小 51.03KB
更新时间 2026-07-04
语言