Snowflake SQL, data pipelines, Cortex AI, and Snowpark Python development. Covers the colon-prefix rule, semi-structured data, MERGE upserts, Dynamic Tables, Streams+Tasks, Cortex AI functions, agent specs, performance tuning, and security hardening.
Originally contributed by James Cha-Earley — enhanced and integrated by the claude-skills team.
# Generate a MERGE upsert template
python scripts/snowflake_query_helper.py merge --target customers --source staging_customers --key customer_id --columns name,email,updated_at
# Generate a Dynamic Table template
python scripts/snowflake_query_helper.py dynamic-table --name cleaned_events --warehouse transform_wh --lag "5 minutes"
# Generate RBAC grant statements
python scripts/snowflake_query_helper.py grant --role analyst_role --database analytics --schemas public,staging --privileges SELECT,USAGE
snake_case for all identifiers. Avoid double-quoted identifiers -- they force case-sensitive names that require constant quoting.WITH clauses) over nested subqueries.CREATE OR REPLACE for idempotent DDL.SELECT * in production. Snowflake's columnar storage scans only referenced columns, so explicit lists reduce I/O.In SQL stored procedures (BEGIN...END blocks), variables and parameters must use the colon : prefix inside SQL statements. Without it, Snowflake treats them as column identifiers and raises "invalid identifier" errors.
-- WRONG: missing colon prefix
SELECT name INTO result FROM users WHERE id = p_id;
-- CORRECT: colon prefix on both variable and parameter
SELECT name INTO :result FROM users WHERE id = :p_id;
This applies to DECLARE variables, LET variables, and procedure parameters when used inside SELECT, INSERT, UPDATE, DELETE, or MERGE.
src:customer.name::STRING. Always cast with ::TYPE.null is stored as the string "null". Use STRIP_NULL_VALUE = TRUE on load.SELECT f.value:name::STRING FROM my_table, LATERAL FLATTEN(input => src:items) f;
MERGE INTO target t USING source s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.name = s.name, t.updated_at = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN INSERT (id, name, updated_at) VALUES (s.id, s.name, CURRENT_TIMESTAMP());
See
references/snowflake_sql_and_pipelines.mdfor deeper SQL patterns and anti-patterns.
| Approach | When to Use |
|---|---|
| Dynamic Tables | Declarative transformations. Default choice. Define the query, Snowflake handles refresh. |
| Streams + Tasks | Imperative CDC. Use for procedural logic, stored procedure calls, complex branching. |
| Snowpipe | Continuous file loading from cloud storage (S3, GCS, Azure). |
CREATE OR REPLACE DYNAMIC TABLE cleaned_events
TARGET_LAG = '5 minutes'
WAREHOUSE = transform_wh
AS
SELECT event_id, event_type, user_id, event_timestamp
FROM raw_events
WHERE event_type IS NOT NULL;
Key rules:
TARGET_LAG progressively: tighter at the top of the DAG, looser downstream.SELECT * breaks on upstream schema changes -- use explicit column lists.CREATE OR REPLACE STREAM raw_stream ON TABLE raw_events;
CREATE OR REPLACE TASK process_events
WAREHOUSE = transform_wh
SCHEDULE = 'USING CRON 0 */1 * * * America/Los_Angeles'
WHEN SYSTEM$STREAM_HAS_DATA('raw_stream')
AS INSERT INTO cleaned_events SELECT ... FROM raw_stream;
-- Tasks start SUSPENDED. You MUST resume them.
ALTER TASK process_events RESUME;
See
references/snowflake_sql_and_pipelines.mdfor DT debugging queries and Snowpipe patterns.
| Function | Purpose |
|---|---|
AI_COMPLETE |
LLM completion (text, images, documents) |
AI_CLASSIFY |
Classify text into categories (up to 500 labels) |
AI_FILTER |
Boolean filter on text or images |
AI_EXTRACT |
Structured extraction from text/images/documents |
AI_SENTIMENT |
Sentiment score (-1 to 1) |
AI_PARSE_DOCUMENT |
OCR or layout extraction from documents |
AI_REDACT |
PII removal from text |
Deprecated names (do NOT use): COMPLETE, CLASSIFY_TEXT, EXTRACT_ANSWER, PARSE_DOCUMENT, SUMMARIZE, TRANSLATE, SENTIMENT, EMBED_TEXT_768.
Stage path and filename are separate arguments:
-- WRONG: single combined argument
TO_FILE('@stage/file.pdf')
-- CORRECT: two arguments
TO_FILE('@db.schema.mystage', 'invoice.pdf')
Agent specs use a JSON structure with top-level keys: models, instructions, tools, tool_resources.
$spec$ delimiter (not $$).models must be an object, not an array.tool_resources is a separate top-level key, not nested inside tools.See
references/cortex_ai_and_agents.mdfor full agent spec examples and Cortex Search patterns.
from snowflake.snowpark import Session
import os
session = Session.builder.configs({
"account": os.environ["SNOWFLAKE_ACCOUNT"],
"user": os.environ["SNOWFLAKE_USER"],
"password": os.environ["SNOWFLAKE_PASSWORD"],
"role": "my_role", "warehouse": "my_wh",
"database": "my_db", "schema": "my_schema"
}).create()
collect() / show().collect() on large DataFrames. Process server-side with DataFrame operations.-- Dynamic table materialization (streaming/near-real-time marts):
{{ config(materialized='dynamic_table', snowflake_warehouse='transforming', target_lag='1 hour') }}
-- Incremental materialization (large fact tables):
{{ config(materialized='incremental', unique_key='event_id') }}
-- Snowflake-specific configs (combine with any materialization):
{{ config(transient=true, copy_grants=true, query_tag='team_daily') }}
{{ this }} without {% if is_incremental() %} guard.dynamic_table materialization for streaming or near-real-time marts.ALTER TABLE t ADD SEARCH OPTIMIZATION ON EQUALITY(col);
AUTO_SUSPEND = 60, AUTO_RESUME = TRUE.SHOW GRANTS OF ROLE ACCOUNTADMIN;
Surface these issues without being asked when you notice them in context:
SELECT * in Dynamic Tables -- flag as a schema-change time bomb.CLASSIFY_TEXT, SUMMARIZE, etc.) -- suggest the current AI_* equivalents.| Error | Cause | Fix |
|---|---|---|
| "Object does not exist" | Wrong database/schema context or missing grants | Fully qualify names (db.schema.table), check grants |
| "Invalid identifier" in procedure | Missing colon prefix on variable | Use :variable_name inside SQL statements |
| "Numeric value not recognized" | VARIANT field not cast | Cast explicitly: src:field::NUMBER(10,2) |
| Task not running | Forgot to resume after creation | ALTER TASK task_name RESUME; |
| DT refresh failing | Schema change upstream or tracking disabled | Use explicit columns, verify change tracking |
| TO_FILE error | Combined path as single argument | Split into two args: TO_FILE('@stage', 'file.pdf') |
TARGET_LAG = '5 minutes' that filters nulls, casts types, deduplicatesSECURE VIEW for the BI tool / API layersnowflake_query_helper.py grant to generate RBAC statementsSELECT AI_CLASSIFY(text_col, ['bug', 'feature', 'question']) FROM table LIMIT 10;
AI_CLASSIFY on new rows automaticallySELECT * FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY()) WHERE STATE = 'FAILED' ORDER BY SCHEDULED_TIME DESC;
SELECT * FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY('my_dt')) ORDER BY REFRESH_END_TIME DESC;
SHOW STREAMS; -- check stale_after column
references/troubleshooting.md for error-specific fixes| Anti-Pattern | Why It Fails | Better Approach |
|---|---|---|
SELECT * in Dynamic Tables |
Schema changes upstream break the DT silently | Use explicit column lists |
| Missing colon prefix in procedures | "Invalid identifier" runtime error | Always use :variable_name in SQL blocks |
| Single warehouse for all workloads | Contention between load, transform, and query | Separate warehouses per workload type |
| Hardcoded credentials in Snowpark | Security risk, breaks in CI/CD | Use os.environ[] or key pair auth |
collect() on large DataFrames |
Pulls entire result set to client memory | Process server-side with DataFrame operations |
| Nested subqueries instead of CTEs | Unreadable, hard to debug, Snowflake optimizes CTEs better | Use WITH clauses |
| Using deprecated Cortex functions | CLASSIFY_TEXT, SUMMARIZE etc. will be removed |
Use AI_CLASSIFY, AI_COMPLETE etc. |
Tasks without WHEN SYSTEM$STREAM_HAS_DATA |
Task runs on schedule even with no new data, wasting credits | Add the WHEN clause for stream-driven tasks |
| Double-quoted identifiers | Forces case-sensitive names across all queries | Use snake_case unquoted identifiers |
| Skill | Relationship |
|---|---|
engineering/sql-database-assistant |
General SQL patterns — use for non-Snowflake databases |
engineering/database-designer |
Schema design — use for data modeling before Snowflake implementation |
engineering-team/senior-data-engineer |
Broader data engineering — pipelines, Spark, Airflow, data quality |
engineering-team/senior-data-scientist |
Analytics and ML — use alongside Snowpark for feature engineering |
engineering-team/senior-devops |
CI/CD for Snowflake deployments (Terraform, GitHub Actions) |
| Document | Contents |
|---|---|
references/snowflake_sql_and_pipelines.md |
SQL patterns, MERGE templates, Dynamic Table debugging, Snowpipe, anti-patterns |
references/cortex_ai_and_agents.md |
Cortex AI functions, agent spec structure, Cortex Search, Snowpark |
references/troubleshooting.md |
Error reference, debugging queries, common fixes |