Skills Data Science Optimize ClickHouse Cloud Costs

Optimize ClickHouse Cloud Costs

v20260423
clickhouse-cost-tuning
A comprehensive guide to reducing expenses on ClickHouse Cloud. Learn advanced techniques for storage optimization (compression, TTL), compute resource tuning (scaling, thread limits), and query efficiency improvements. Use this when managing data budgets or analyzing high ClickHouse billing charges.
Get Skill
243 downloads
Overview

ClickHouse Cost Tuning

Overview

Reduce ClickHouse Cloud costs through storage optimization, compression tuning, TTL policies, compute scaling, and query efficiency improvements.

Prerequisites

  • ClickHouse Cloud account with billing access
  • Understanding of current data volumes and query patterns

Instructions

Step 1: Understand ClickHouse Cloud Pricing

Component Pricing Model Key Driver
Compute Per-hour per replica vCPU + memory tier
Storage Per GB-month Compressed data on disk
Network Per GB egress Query result sizes
Backups Per GB stored Backup retention

Key insight: ClickHouse bills on compressed storage, and ClickHouse compresses extremely well (often 10-20x). Your cost driver is usually compute, not storage.

Step 2: Analyze Storage Usage

-- Storage cost breakdown by table
SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS compressed_size,
    formatReadableSize(sum(data_uncompressed_bytes)) AS raw_size,
    round(sum(data_uncompressed_bytes) / sum(bytes_on_disk), 1) AS compression_ratio,
    sum(rows) AS total_rows,
    count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

-- Storage by column (find bloated columns)
SELECT
    table,
    column,
    type,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes)) AS raw,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics'
GROUP BY table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 30;

Step 3: Improve Compression

-- Check current codec per column
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'analytics' AND table = 'events';

-- Apply better codecs to large columns
ALTER TABLE analytics.events
    MODIFY COLUMN properties String CODEC(ZSTD(3));  -- JSON blobs

ALTER TABLE analytics.events
    MODIFY COLUMN created_at DateTime CODEC(DoubleDelta, ZSTD);  -- Timestamps

ALTER TABLE analytics.events
    MODIFY COLUMN user_id UInt64 CODEC(Delta, ZSTD);  -- Sequential IDs

-- Verify improvement after next merge
OPTIMIZE TABLE analytics.events FINAL;

-- Check new compression ratio
SELECT
    column,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics' AND table = 'events'
GROUP BY column ORDER BY sum(column_data_compressed_bytes) DESC;

Step 4: TTL for Data Lifecycle

-- Expire old data automatically (reduces storage)
ALTER TABLE analytics.events
    MODIFY TTL created_at + INTERVAL 90 DAY;

-- Move old data to cheaper storage tier (ClickHouse Cloud)
ALTER TABLE analytics.events
    MODIFY TTL
        created_at + INTERVAL 30 DAY TO VOLUME 'hot',
        created_at + INTERVAL 90 DAY TO VOLUME 'cold',
        created_at + INTERVAL 365 DAY DELETE;

-- Drop entire partitions manually (fastest way to delete bulk data)
ALTER TABLE analytics.events
    DROP PARTITION '202401';   -- Drops January 2024

-- Check TTL status
SELECT database, table, result_ttl_expression
FROM system.tables
WHERE database = 'analytics';

Step 5: Compute Cost Reduction

-- ClickHouse Cloud: Scale compute dynamically
-- Configure in Cloud Console:
-- - Auto-scaling: min 2 / max 8 replicas
-- - Idle timeout: 5 minutes (auto-suspend when no queries)
-- - Use "Development" tier for staging environments

-- Reduce per-query compute consumption
SET max_threads = 4;                  -- Use fewer cores per query
SET max_memory_usage = 5000000000;    -- 5GB cap per query

-- Server-side async inserts (reduces insert compute)
SET async_insert = 1;
SET async_insert_max_data_size = 10000000;  -- Flush at 10MB
SET async_insert_busy_timeout_ms = 5000;    -- or every 5 seconds

Step 6: Query Efficiency = Lower Costs

-- Find the most expensive queries (by data scanned)
SELECT
    normalized_query_hash,
    count() AS executions,
    formatReadableSize(sum(read_bytes)) AS total_read,
    round(avg(query_duration_ms)) AS avg_ms,
    any(substring(query, 1, 200)) AS sample
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 7 DAY
GROUP BY normalized_query_hash
ORDER BY sum(read_bytes) DESC
LIMIT 20;

-- Use materialized views to avoid repeated full scans
-- Instead of: SELECT count() FROM events WHERE date = today()
-- Pre-compute:
-- CREATE MATERIALIZED VIEW daily_counts_mv TO daily_counts AS
--   SELECT toDate(created_at) AS date, count() AS cnt FROM events GROUP BY date;
-- Then: SELECT cnt FROM daily_counts WHERE date = today()

-- Use PREWHERE to read less data
SELECT user_id, properties FROM analytics.events
PREWHERE event_type = 'purchase'    -- Filter first, read fewer columns
WHERE created_at >= today() - 7;

Step 7: Monitor Costs

// Track query costs in your application
async function queryWithCostTracking<T>(
  client: ReturnType<typeof import('@clickhouse/client').createClient>,
  sql: string,
): Promise<{ rows: T[]; cost: { readRows: number; readBytes: number; durationMs: number } }> {
  const start = Date.now();
  const rs = await client.query({ query: sql, format: 'JSONEachRow' });
  const rows = await rs.json<T>();
  const durationMs = Date.now() - start;

  // Log for cost analysis
  console.log({
    query: sql.slice(0, 100),
    readRows: rs.response_headers['x-clickhouse-summary']
      ? JSON.parse(rs.response_headers['x-clickhouse-summary']).read_rows
      : 'unknown',
    durationMs,
  });

  return { rows, cost: { readRows: 0, readBytes: 0, durationMs } };
}

Cost Optimization Checklist

  • Compression codecs applied to large columns (ZSTD, Delta, DoubleDelta)
  • TTL configured for data expiration
  • Auto-scaling and idle suspension enabled (Cloud)
  • Development/staging on smaller tiers
  • Materialized views for dashboard queries
  • max_threads limited for non-critical queries
  • async_insert enabled for high-frequency small inserts
  • Monthly cost review with system.query_log analysis

Error Handling

Issue Cause Solution
Storage growing fast No TTL, no drops Add TTL or schedule partition drops
High compute bill Full-scan queries Add materialized views, fix ORDER BY
Egress charges Large result sets Add LIMIT, use aggregations
Idle compute cost No auto-suspend Enable idle timeout in Cloud console

Resources

Next Steps

For architecture patterns, see clickhouse-reference-architecture.

Info
Category Data Science
Name clickhouse-cost-tuning
Version v20260423
Size 7.63KB
Updated At 2026-04-26
Language