Ultimate Loki Framework
Streamline your workflow with this multi, agent, autonomous, startup. Includes structured workflows, validation checks, and reusable patterns for ai research.
Ultimate Loki Framework
Overview
Grafana Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system designed to be cost-effective and easy to operate. Inspired by Prometheus, Loki indexes only metadata labels rather than the full text of log lines, making it 10-100x more storage-efficient than traditional log aggregation systems like Elasticsearch. Log data is compressed and stored in chunks in object storage (S3, GCS, Azure Blob) or on the local filesystem, while a small index tracks label sets and time ranges. Loki integrates natively with Grafana for visualization and alerting, uses LogQL (a PromQL-inspired query language) for searching and filtering logs, and collects logs via Grafana Alloy (the successor to Promtail), Fluentd, Fluent Bit, or any OpenTelemetry-compatible agent. Whether you are running a Kubernetes cluster or bare-metal servers, Loki provides a unified logging backend that scales from single-node deployments to multi-tenant clusters handling terabytes of logs daily.
When to Use
- Kubernetes log aggregation: Collect and query logs from all pods with automatic label extraction from Kubernetes metadata.
- Cost-effective log storage: When Elasticsearch or Splunk costs are prohibitive and you do not need full-text indexing on every log line.
- Prometheus-native teams: If your team already uses Prometheus and Grafana, Loki provides a natural extension for logs with familiar concepts.
- Multi-tenant logging: Serve multiple teams or customers from a single Loki cluster with tenant isolation.
- Alerting on log patterns: Define alert rules based on log patterns, error rates, or specific log line content using LogQL.
- Correlation with metrics and traces: Link logs to Prometheus metrics and Tempo traces in Grafana dashboards for unified observability.
Quick Start
Docker Compose Deployment
# docker-compose.yaml version: "3.8" services: loki: image: grafana/loki:3.3.0 ports: - "3100:3100" volumes: - ./loki-config.yaml:/etc/loki/config.yaml - loki-data:/loki command: -config.file=/etc/loki/config.yaml alloy: image: grafana/alloy:latest volumes: - ./alloy-config.alloy:/etc/alloy/config.alloy - /var/log:/var/log:ro command: run /etc/alloy/config.alloy depends_on: - loki grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana-data:/var/lib/grafana depends_on: - loki volumes: loki-data: grafana-data:
Minimal Loki Configuration
# loki-config.yaml auth_enabled: false server: http_listen_port: 3100 common: ring: kvstore: store: inmemory replication_factor: 1 path_prefix: /loki schema_config: configs: - from: 2024-01-01 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h storage_config: filesystem: directory: /loki/chunks limits_config: reject_old_samples: true reject_old_samples_max_age: 168h ingestion_rate_mb: 16 ingestion_burst_size_mb: 32
Send Test Logs
# Push a log entry via the Loki API curl -X POST http://localhost:3100/loki/api/v1/push \ -H "Content-Type: application/json" \ -d '{ "streams": [{ "stream": {"app": "test", "env": "dev"}, "values": [ ["'"$(date +%s)000000000"'", "Hello from Loki!"] ] }] }' # Query logs via LogQL curl -G http://localhost:3100/loki/api/v1/query_range \ --data-urlencode 'query={app="test"}' \ --data-urlencode 'limit=10'
Core Concepts
LogQL Query Language
LogQL is Loki's query language, inspired by PromQL. It operates on log streams selected by labels:
# Basic stream selection {app="frontend", env="production"} # Filter log lines containing "error" {app="frontend"} |= "error" # Exclude lines containing "healthcheck" {app="frontend"} != "healthcheck" # Regex filter {app="frontend"} |~ "status=(4|5)\\d{2}" # Parse structured logs with logfmt {app="api"} | logfmt | duration > 500ms # Parse JSON logs {app="api"} | json | response_code >= 400 # Combine filters and parsing {app="api", env="production"} |= "error" | json | line_format "{{.timestamp}} [{{.level}}] {{.message}}" # Metric queries (rates and aggregations) rate({app="api"} |= "error" [5m]) # Top 5 apps by error rate topk(5, sum by(app) (rate({env="production"} |= "error" [5m]))) # Quantile of parsed durations quantile_over_time(0.95, {app="api"} | logfmt | unwrap duration [5m])
Grafana Alloy Configuration
// alloy-config.alloy // Collect logs from files local.file_match "logs" { path_targets = [ {__path__ = "/var/log/app/*.log", app = "myapp", env = "production"}, ] } loki.source.file "files" { targets = local.file_match.logs.targets forward_to = [loki.write.default.receiver] } // Kubernetes pod log collection discovery.kubernetes "pods" { role = "pod" } loki.source.kubernetes "pods" { targets = discovery.kubernetes.pods.targets forward_to = [loki.process.pipeline.receiver] } // Processing pipeline: parse and label loki.process "pipeline" { stage.json { expressions = {level = "level", msg = "message"} } stage.labels { values = {level = ""} } forward_to = [loki.write.default.receiver] } // Write to Loki loki.write "default" { endpoint { url = "http://loki:3100/loki/api/v1/push" } }
Kubernetes Helm Deployment
# Add Grafana Helm repository helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Install Loki in monolithic mode (recommended for < 100GB/day) helm install loki grafana/loki \ --namespace loki --create-namespace \ --set loki.auth_enabled=false \ --set singleBinary.replicas=1 \ --set loki.storage.type=s3 \ --set loki.storage.s3.endpoint=s3.amazonaws.com \ --set loki.storage.s3.region=us-east-1 \ --set loki.storage.s3.bucketnames=my-loki-bucket \ --set loki.storage.s3.access_key_id=$AWS_ACCESS_KEY \ --set loki.storage.s3.secret_access_key=$AWS_SECRET_KEY # Install Alloy for log collection helm install alloy grafana/alloy \ --namespace loki \ --set alloy.configMap.content="$(cat alloy-config.alloy)"
Alert Rules
# loki-rules.yaml groups: - name: application-alerts rules: - alert: HighErrorRate expr: | sum(rate({app="api"} |= "error" [5m])) by (app) / sum(rate({app="api"} [5m])) by (app) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate for {{ $labels.app }}" - alert: SlowResponses expr: | quantile_over_time(0.95, {app="api"} | logfmt | unwrap duration [5m] ) > 2000 for: 10m labels: severity: warning
Configuration Reference
| Parameter | Description | Recommended Value |
|---|---|---|
auth_enabled | Multi-tenant authentication | false (single-tenant), true (multi) |
schema_config.store | Index store type | tsdb (recommended for v3+) |
schema_config.object_store | Chunk storage backend | s3, gcs, azure, filesystem |
limits_config.ingestion_rate_mb | Per-tenant ingestion rate limit | 16 MB/s |
limits_config.ingestion_burst_size_mb | Burst ingestion limit | 32 MB |
limits_config.max_query_length | Maximum query time range | 721h |
limits_config.reject_old_samples_max_age | Reject logs older than this | 168h (7 days) |
compactor.retention_enabled | Enable log retention/deletion | true |
compactor.retention_period | How long to keep logs | 744h (31 days) |
Deployment Modes
| Mode | Scale | When to Use |
|---|---|---|
| Monolithic (single binary) | < 100 GB/day | Small to medium deployments |
| Simple Scalable | 100 GB - 1 TB/day | Read/write path separation |
| Microservices | > 1 TB/day | Large-scale production |
Best Practices
-
Use labels sparingly: Loki indexes labels, not log content. Keep cardinality low (under 100,000 unique label combinations). Use log pipeline filters in LogQL instead of creating high-cardinality labels like user IDs or request IDs.
-
Structure your logs as JSON: Emit logs as JSON from your applications. This enables LogQL's
| jsonparser to extract fields at query time without requiring additional labels. -
Use the TSDB index store: For new deployments, always use
store: tsdbwithschema: v13. This is significantly more efficient than the older BoltDB index. -
Configure retention policies: Enable compactor retention to automatically delete old logs. Without retention, storage grows unbounded and costs escalate.
-
Deploy Grafana Alloy instead of Promtail: Alloy is the actively maintained log collector that replaces Promtail. It supports the same features plus OpenTelemetry ingestion and processing pipelines.
-
Separate read and write paths at scale: For deployments exceeding 100GB/day, use Simple Scalable mode to independently scale readers and writers.
-
Use object storage in production: Filesystem storage is suitable only for development. In production, use S3, GCS, or Azure Blob Storage for durability and scalability.
-
Set ingestion limits per tenant: Configure
ingestion_rate_mbandingestion_burst_size_mbto prevent runaway logging from one application or tenant from impacting the entire cluster. -
Correlate logs with traces and metrics: Use Grafana's data source correlation feature to link Loki logs with Tempo traces and Prometheus metrics for unified observability dashboards.
-
Monitor Loki itself: Loki exposes Prometheus metrics on
/metrics. Monitor ingestion rate, query latency, and storage usage to detect issues before they impact log availability.
Troubleshooting
Logs not appearing in Grafana
Verify Loki is receiving logs by querying curl http://localhost:3100/ready. Check Alloy logs for push errors. Ensure the Grafana data source URL points to the correct Loki endpoint. Confirm labels match your LogQL query.
Query timeout on large time ranges
Reduce the query time range or add more specific label selectors to narrow the stream. Enable query splitting via split_queries_by_interval in the query frontend. Consider increasing max_query_length limits.
High cardinality warnings Review your labels and remove any with high cardinality (unique values > 10,000). Common offenders are request IDs, user IDs, and IP addresses. Move these to structured log fields and query them with LogQL parsers.
Out-of-order log entries rejected
Configure unordered_writes: true in the ingester config if your log sources cannot guarantee ordering. Alternatively, increase max_chunk_age to tolerate minor timestamp variations.
Storage costs growing unexpectedly
Enable retention in the compactor. Review ingestion rates per tenant with loki_distributor_bytes_received_total metric. Identify noisy applications and add rate limits or reduce their log verbosity.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.