Snowflake Production Readiness Checklist

v20260423

snowflake-prod-checklist

This comprehensive checklist guides the deployment of Snowflake data pipelines and integrations into a production environment. It covers critical areas such as authentication best practices, warehouse configuration, data pipeline readiness, access control, monitoring setup, and disaster recovery planning. Use it when preparing for a go-live or validating the stability of your production Snowflake setup.

Snowflake Data Warehouse Production Deployment Monitoring ETL Analytics Checklist

Get Skill

294 downloads

Overview

Snowflake Production Checklist

Overview

Complete checklist for deploying Snowflake data pipelines and integrations to production.

Prerequisites

Staging environment validated
Production Snowflake account configured
Resource monitors in place
Monitoring infrastructure ready

Pre-Deployment Checklist

Authentication & Secrets

Service accounts use key pair auth (not password)
Private keys stored in secret manager (not files/env vars)
Key rotation procedure documented and tested
Network policy applied to production account
Connection parameters use production account identifier

Warehouse Configuration

Production warehouses created with appropriate sizing
Auto-suspend configured (60-300s based on workload)
Auto-resume enabled
Resource monitors with credit quotas and alerts
Separate warehouses for ETL, analytics, and dashboard workloads

-- Production warehouse setup
CREATE WAREHOUSE IF NOT EXISTS PROD_ETL_WH
  WAREHOUSE_SIZE = 'LARGE'
  AUTO_SUSPEND = 120
  AUTO_RESUME = TRUE;

CREATE WAREHOUSE IF NOT EXISTS PROD_ANALYTICS_WH
  WAREHOUSE_SIZE = 'MEDIUM'
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = 'STANDARD'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE;

-- Resource monitor with alerts
CREATE OR REPLACE RESOURCE MONITOR prod_monitor
  WITH CREDIT_QUOTA = 1000
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND
    ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE PROD_ETL_WH SET RESOURCE_MONITOR = prod_monitor;
ALTER WAREHOUSE PROD_ANALYTICS_WH SET RESOURCE_MONITOR = prod_monitor;

Data Pipeline Readiness

All tasks resumed and running on schedule
Streams not stale (check with SHOW STREAMS)
Snowpipe notifications configured and verified
COPY INTO error handling set (ON_ERROR = 'CONTINUE' or 'SKIP_FILE')
Data retention set appropriately (DATA_RETENTION_TIME_IN_DAYS)

Query & Performance

Critical queries tested at production data volume
Clustering keys set on large tables (>1TB)
Statement timeout configured per warehouse
Result caching enabled (USE_CACHED_RESULT = TRUE)

-- Set statement timeout for production
ALTER WAREHOUSE PROD_ETL_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 3600;
ALTER WAREHOUSE PROD_ANALYTICS_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 600;

-- Enable query result caching (default is ON)
ALTER ACCOUNT SET USE_CACHED_RESULT = TRUE;

Access Control

RBAC hierarchy follows Snowflake best practices
No users have ACCOUNTADMIN as default role
Service accounts have minimal required privileges
Object ownership assigned to functional roles

-- Verify no one defaults to ACCOUNTADMIN
SELECT name, default_role
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS
WHERE default_role = 'ACCOUNTADMIN' AND disabled = 'false';

Monitoring & Alerting

Query failure alerts configured
Warehouse credit consumption dashboards
Task failure notifications
Login failure monitoring

-- Create alert for task failures (Snowflake Alerts feature)
CREATE OR REPLACE ALERT task_failure_alert
  WAREHOUSE = PROD_ANALYTICS_WH
  SCHEDULE = '5 MINUTE'
  IF (EXISTS (
    SELECT *
    FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
      SCHEDULED_TIME_RANGE_START => DATEADD(minutes, -10, CURRENT_TIMESTAMP())
    ))
    WHERE state = 'FAILED'
  ))
  THEN
    CALL SYSTEM$SEND_EMAIL(
      'prod_notifications',
      'oncall@company.com',
      'Snowflake Task Failure',
      'One or more tasks failed in the last 10 minutes. Check TASK_HISTORY.'
    );

ALTER ALERT task_failure_alert RESUME;

Disaster Recovery

Time Travel retention set (Enterprise: up to 90 days)
Database replication configured for critical databases
Failover tested to secondary account/region
Backup procedure documented

-- Enable 14-day Time Travel on production tables
ALTER TABLE prod_db.core.orders SET DATA_RETENTION_TIME_IN_DAYS = 14;

-- Enable database replication
ALTER DATABASE prod_db ENABLE REPLICATION TO ACCOUNTS myorg.secondary_account;

Health Check Query

-- Run this before and after deployment
SELECT 'Warehouses' AS check_type,
       COUNT(*) AS count,
       COUNT_IF(state = 'STARTED') AS active
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSES())
UNION ALL
SELECT 'Tasks', COUNT(*), COUNT_IF(state = 'started')
FROM TABLE(INFORMATION_SCHEMA.TASKS())
UNION ALL
SELECT 'Streams', COUNT(*), COUNT_IF(stale = FALSE)
FROM TABLE(INFORMATION_SCHEMA.STREAMS())
UNION ALL
SELECT 'Pipes', COUNT(*), COUNT_IF(is_autoingest_enabled = 'true')
FROM TABLE(INFORMATION_SCHEMA.PIPES());

Rollback Procedure

-- Use Time Travel to revert a table
CREATE OR REPLACE TABLE prod_db.core.orders
  CLONE prod_db.core.orders AT (TIMESTAMP => '2026-03-21 12:00:00'::TIMESTAMP_NTZ);

-- Suspend problematic tasks
ALTER TASK transform_orders SUSPEND;

-- Revert warehouse changes
ALTER WAREHOUSE PROD_ETL_WH SET WAREHOUSE_SIZE = 'MEDIUM';

Error Handling

Alert	Condition	Severity
Task failure	`state = 'FAILED'` in TASK_HISTORY	P1
Stream stale	`stale = TRUE` in SHOW STREAMS	P1
Credit quota >90%	Resource monitor trigger	P2
Query queue >5min	`avg_queued_load > 0` sustained	P2
Login failures spike	>10 failures/hour	P2

Resources

Next Steps

For version upgrades, see snowflake-upgrade-migration.

Info

Category Data Science

Name snowflake-prod-checklist

Version v20260423

Size 6.34KB

Source jeremylongshore/claude-code-plugins-plus-skills

Updated At 2026-04-28