Cc Skill Clickhouse System
Enterprise-grade skill for clickhouse, database, patterns, query. Includes structured workflows, validation checks, and reusable patterns for development.
ClickHouse Analytics System
A Claude Code skill for building high-performance analytics with ClickHouse. Covers ClickHouse-specific schema design, query optimization, materialized views, data ingestion patterns, and integration with Node.js/TypeScript applications for real-time analytics and data engineering workloads.
When to Use This Skill
Choose ClickHouse Analytics when:
- You need real-time analytics on large datasets (billions of rows)
- You want to set up ClickHouse for event tracking, logs, or time-series data
- You need to optimize ClickHouse queries for sub-second response times
- You want to implement materialized views for pre-aggregated analytics
- You're integrating ClickHouse with a Node.js/TypeScript application
Consider alternatives when:
- You need transactional database features (use PostgreSQL)
- You want managed analytics (use BigQuery or Redshift)
- You need full-text search (use Elasticsearch)
Quick Start
# Start ClickHouse (Docker) docker run -d --name clickhouse -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server # Install the skill claude install cc-skill-clickhouse-system # Create analytics tables claude "Design a ClickHouse schema for tracking page views: user_id, page_url, timestamp, device_type, country" # Build a dashboard query claude "Write a ClickHouse query for a dashboard: daily active users, page views per day, and top pages for the last 30 days"
Core Concepts
ClickHouse vs Traditional Databases
| Feature | ClickHouse | PostgreSQL |
|---|---|---|
| Architecture | Column-oriented | Row-oriented |
| Best For | Analytics, aggregations | Transactions, CRUD |
| Insert Speed | Millions/sec (batch) | Thousands/sec |
| Query Speed | Billions of rows in seconds | Millions of rows |
| Updates/Deletes | Limited, async | Full ACID support |
| Joins | Limited (use denormalization) | Full JOIN support |
Table Engine Selection
| Engine | Use Case | Features |
|---|---|---|
| MergeTree | Primary analytics tables | Sorted, partitioned, indexed |
| ReplacingMergeTree | Deduplication by version | Last version wins |
| AggregatingMergeTree | Pre-aggregated data | Materialized view target |
| SummingMergeTree | Running sums | Auto-summing on merge |
| CollapsingMergeTree | Mutable analytics | Sign-based row collapsing |
Schema Design Pattern
CREATE TABLE page_views ( user_id UInt64, event_time DateTime, page_url String, device_type LowCardinality(String), country LowCardinality(String), session_id UUID, duration_ms UInt32 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(event_time) ORDER BY (user_id, event_time) TTL event_time + INTERVAL 90 DAY SETTINGS index_granularity = 8192;
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
engine | string | "merge_tree" | Table engine: merge_tree, replacing, aggregating |
partition_by | string | "monthly" | Partition: daily, monthly, yearly |
ttl_days | number | 90 | Data retention in days |
replication | boolean | false | Enable replicated tables |
client | string | "clickhouse-js" | Client: clickhouse-js (official), @clickhouse/client |
Best Practices
-
Use LowCardinality for string columns with few unique values — Country codes, device types, status enums — wrap these in
LowCardinality(String)for 5-10x compression and faster queries. Don't use it for high-cardinality columns like user IDs. -
Batch inserts, never insert one row at a time — ClickHouse is optimized for bulk inserts (1,000+ rows per insert). Single-row inserts create too many parts and degrade performance. Buffer events and flush in batches.
-
Order the ORDER BY clause by query patterns — The ORDER BY in table creation determines the primary index. Put your most frequently filtered column first. If you always query by
user_idandevent_time, useORDER BY (user_id, event_time). -
Use materialized views for dashboards — Pre-aggregate data into materialized views for dashboard queries. A materialized view that counts daily events is queried in milliseconds, while aggregating billions of raw events takes seconds.
-
Partition by time for data lifecycle — Partition tables by month or day using
PARTITION BY toYYYYMM(event_time). This enables efficient data expiration (TTL), partition-level operations, and faster queries on time ranges.
Common Issues
Inserts are slow — You're likely inserting one row at a time. Batch inserts with at least 1,000 rows per batch. Use async insert mode (async_insert=1) for lower-latency single-row inserts that ClickHouse batches internally.
"Too many parts" error — Too many small inserts create excessive parts. ClickHouse merges parts in the background, but if inserts outpace merges, you hit this error. Reduce insert frequency, increase batch size, and check merge performance.
Queries are slow despite ClickHouse being fast — Check that your query uses the primary key columns. Queries that don't filter on ORDER BY columns scan all data. Use EXPLAIN to verify that primary key pruning is working.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.