C

Cc Skill Clickhouse System

Enterprise-grade skill for clickhouse, database, patterns, query. Includes structured workflows, validation checks, and reusable patterns for development.

SkillClipticsdevelopmentv1.0.0MIT
0 views0 copies

ClickHouse Analytics System

A Claude Code skill for building high-performance analytics with ClickHouse. Covers ClickHouse-specific schema design, query optimization, materialized views, data ingestion patterns, and integration with Node.js/TypeScript applications for real-time analytics and data engineering workloads.

When to Use This Skill

Choose ClickHouse Analytics when:

  • You need real-time analytics on large datasets (billions of rows)
  • You want to set up ClickHouse for event tracking, logs, or time-series data
  • You need to optimize ClickHouse queries for sub-second response times
  • You want to implement materialized views for pre-aggregated analytics
  • You're integrating ClickHouse with a Node.js/TypeScript application

Consider alternatives when:

  • You need transactional database features (use PostgreSQL)
  • You want managed analytics (use BigQuery or Redshift)
  • You need full-text search (use Elasticsearch)

Quick Start

# Start ClickHouse (Docker) docker run -d --name clickhouse -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server # Install the skill claude install cc-skill-clickhouse-system # Create analytics tables claude "Design a ClickHouse schema for tracking page views: user_id, page_url, timestamp, device_type, country" # Build a dashboard query claude "Write a ClickHouse query for a dashboard: daily active users, page views per day, and top pages for the last 30 days"

Core Concepts

ClickHouse vs Traditional Databases

FeatureClickHousePostgreSQL
ArchitectureColumn-orientedRow-oriented
Best ForAnalytics, aggregationsTransactions, CRUD
Insert SpeedMillions/sec (batch)Thousands/sec
Query SpeedBillions of rows in secondsMillions of rows
Updates/DeletesLimited, asyncFull ACID support
JoinsLimited (use denormalization)Full JOIN support

Table Engine Selection

EngineUse CaseFeatures
MergeTreePrimary analytics tablesSorted, partitioned, indexed
ReplacingMergeTreeDeduplication by versionLast version wins
AggregatingMergeTreePre-aggregated dataMaterialized view target
SummingMergeTreeRunning sumsAuto-summing on merge
CollapsingMergeTreeMutable analyticsSign-based row collapsing

Schema Design Pattern

CREATE TABLE page_views ( user_id UInt64, event_time DateTime, page_url String, device_type LowCardinality(String), country LowCardinality(String), session_id UUID, duration_ms UInt32 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(event_time) ORDER BY (user_id, event_time) TTL event_time + INTERVAL 90 DAY SETTINGS index_granularity = 8192;

Configuration

ParameterTypeDefaultDescription
enginestring"merge_tree"Table engine: merge_tree, replacing, aggregating
partition_bystring"monthly"Partition: daily, monthly, yearly
ttl_daysnumber90Data retention in days
replicationbooleanfalseEnable replicated tables
clientstring"clickhouse-js"Client: clickhouse-js (official), @clickhouse/client

Best Practices

  1. Use LowCardinality for string columns with few unique values — Country codes, device types, status enums — wrap these in LowCardinality(String) for 5-10x compression and faster queries. Don't use it for high-cardinality columns like user IDs.

  2. Batch inserts, never insert one row at a time — ClickHouse is optimized for bulk inserts (1,000+ rows per insert). Single-row inserts create too many parts and degrade performance. Buffer events and flush in batches.

  3. Order the ORDER BY clause by query patterns — The ORDER BY in table creation determines the primary index. Put your most frequently filtered column first. If you always query by user_id and event_time, use ORDER BY (user_id, event_time).

  4. Use materialized views for dashboards — Pre-aggregate data into materialized views for dashboard queries. A materialized view that counts daily events is queried in milliseconds, while aggregating billions of raw events takes seconds.

  5. Partition by time for data lifecycle — Partition tables by month or day using PARTITION BY toYYYYMM(event_time). This enables efficient data expiration (TTL), partition-level operations, and faster queries on time ranges.

Common Issues

Inserts are slow — You're likely inserting one row at a time. Batch inserts with at least 1,000 rows per batch. Use async insert mode (async_insert=1) for lower-latency single-row inserts that ClickHouse batches internally.

"Too many parts" error — Too many small inserts create excessive parts. ClickHouse merges parts in the background, but if inserts outpace merges, you hit this error. Reduce insert frequency, increase batch size, and check merge performance.

Queries are slow despite ClickHouse being fast — Check that your query uses the primary key columns. Queries that don't filter on ORDER BY columns scan all data. Use EXPLAIN to verify that primary key pruning is working.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates