U

Ultimate Loki Framework

Streamline your workflow with this multi, agent, autonomous, startup. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Ultimate Loki Framework

Overview

Grafana Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system designed to be cost-effective and easy to operate. Inspired by Prometheus, Loki indexes only metadata labels rather than the full text of log lines, making it 10-100x more storage-efficient than traditional log aggregation systems like Elasticsearch. Log data is compressed and stored in chunks in object storage (S3, GCS, Azure Blob) or on the local filesystem, while a small index tracks label sets and time ranges. Loki integrates natively with Grafana for visualization and alerting, uses LogQL (a PromQL-inspired query language) for searching and filtering logs, and collects logs via Grafana Alloy (the successor to Promtail), Fluentd, Fluent Bit, or any OpenTelemetry-compatible agent. Whether you are running a Kubernetes cluster or bare-metal servers, Loki provides a unified logging backend that scales from single-node deployments to multi-tenant clusters handling terabytes of logs daily.

When to Use

  • Kubernetes log aggregation: Collect and query logs from all pods with automatic label extraction from Kubernetes metadata.
  • Cost-effective log storage: When Elasticsearch or Splunk costs are prohibitive and you do not need full-text indexing on every log line.
  • Prometheus-native teams: If your team already uses Prometheus and Grafana, Loki provides a natural extension for logs with familiar concepts.
  • Multi-tenant logging: Serve multiple teams or customers from a single Loki cluster with tenant isolation.
  • Alerting on log patterns: Define alert rules based on log patterns, error rates, or specific log line content using LogQL.
  • Correlation with metrics and traces: Link logs to Prometheus metrics and Tempo traces in Grafana dashboards for unified observability.

Quick Start

Docker Compose Deployment

# docker-compose.yaml version: "3.8" services: loki: image: grafana/loki:3.3.0 ports: - "3100:3100" volumes: - ./loki-config.yaml:/etc/loki/config.yaml - loki-data:/loki command: -config.file=/etc/loki/config.yaml alloy: image: grafana/alloy:latest volumes: - ./alloy-config.alloy:/etc/alloy/config.alloy - /var/log:/var/log:ro command: run /etc/alloy/config.alloy depends_on: - loki grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - grafana-data:/var/lib/grafana depends_on: - loki volumes: loki-data: grafana-data:

Minimal Loki Configuration

# loki-config.yaml auth_enabled: false server: http_listen_port: 3100 common: ring: kvstore: store: inmemory replication_factor: 1 path_prefix: /loki schema_config: configs: - from: 2024-01-01 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h storage_config: filesystem: directory: /loki/chunks limits_config: reject_old_samples: true reject_old_samples_max_age: 168h ingestion_rate_mb: 16 ingestion_burst_size_mb: 32

Send Test Logs

# Push a log entry via the Loki API curl -X POST http://localhost:3100/loki/api/v1/push \ -H "Content-Type: application/json" \ -d '{ "streams": [{ "stream": {"app": "test", "env": "dev"}, "values": [ ["'"$(date +%s)000000000"'", "Hello from Loki!"] ] }] }' # Query logs via LogQL curl -G http://localhost:3100/loki/api/v1/query_range \ --data-urlencode 'query={app="test"}' \ --data-urlencode 'limit=10'

Core Concepts

LogQL Query Language

LogQL is Loki's query language, inspired by PromQL. It operates on log streams selected by labels:

# Basic stream selection {app="frontend", env="production"} # Filter log lines containing "error" {app="frontend"} |= "error" # Exclude lines containing "healthcheck" {app="frontend"} != "healthcheck" # Regex filter {app="frontend"} |~ "status=(4|5)\\d{2}" # Parse structured logs with logfmt {app="api"} | logfmt | duration > 500ms # Parse JSON logs {app="api"} | json | response_code >= 400 # Combine filters and parsing {app="api", env="production"} |= "error" | json | line_format "{{.timestamp}} [{{.level}}] {{.message}}" # Metric queries (rates and aggregations) rate({app="api"} |= "error" [5m]) # Top 5 apps by error rate topk(5, sum by(app) (rate({env="production"} |= "error" [5m]))) # Quantile of parsed durations quantile_over_time(0.95, {app="api"} | logfmt | unwrap duration [5m])

Grafana Alloy Configuration

// alloy-config.alloy // Collect logs from files local.file_match "logs" { path_targets = [ {__path__ = "/var/log/app/*.log", app = "myapp", env = "production"}, ] } loki.source.file "files" { targets = local.file_match.logs.targets forward_to = [loki.write.default.receiver] } // Kubernetes pod log collection discovery.kubernetes "pods" { role = "pod" } loki.source.kubernetes "pods" { targets = discovery.kubernetes.pods.targets forward_to = [loki.process.pipeline.receiver] } // Processing pipeline: parse and label loki.process "pipeline" { stage.json { expressions = {level = "level", msg = "message"} } stage.labels { values = {level = ""} } forward_to = [loki.write.default.receiver] } // Write to Loki loki.write "default" { endpoint { url = "http://loki:3100/loki/api/v1/push" } }

Kubernetes Helm Deployment

# Add Grafana Helm repository helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Install Loki in monolithic mode (recommended for < 100GB/day) helm install loki grafana/loki \ --namespace loki --create-namespace \ --set loki.auth_enabled=false \ --set singleBinary.replicas=1 \ --set loki.storage.type=s3 \ --set loki.storage.s3.endpoint=s3.amazonaws.com \ --set loki.storage.s3.region=us-east-1 \ --set loki.storage.s3.bucketnames=my-loki-bucket \ --set loki.storage.s3.access_key_id=$AWS_ACCESS_KEY \ --set loki.storage.s3.secret_access_key=$AWS_SECRET_KEY # Install Alloy for log collection helm install alloy grafana/alloy \ --namespace loki \ --set alloy.configMap.content="$(cat alloy-config.alloy)"

Alert Rules

# loki-rules.yaml groups: - name: application-alerts rules: - alert: HighErrorRate expr: | sum(rate({app="api"} |= "error" [5m])) by (app) / sum(rate({app="api"} [5m])) by (app) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate for {{ $labels.app }}" - alert: SlowResponses expr: | quantile_over_time(0.95, {app="api"} | logfmt | unwrap duration [5m] ) > 2000 for: 10m labels: severity: warning

Configuration Reference

ParameterDescriptionRecommended Value
auth_enabledMulti-tenant authenticationfalse (single-tenant), true (multi)
schema_config.storeIndex store typetsdb (recommended for v3+)
schema_config.object_storeChunk storage backends3, gcs, azure, filesystem
limits_config.ingestion_rate_mbPer-tenant ingestion rate limit16 MB/s
limits_config.ingestion_burst_size_mbBurst ingestion limit32 MB
limits_config.max_query_lengthMaximum query time range721h
limits_config.reject_old_samples_max_ageReject logs older than this168h (7 days)
compactor.retention_enabledEnable log retention/deletiontrue
compactor.retention_periodHow long to keep logs744h (31 days)

Deployment Modes

ModeScaleWhen to Use
Monolithic (single binary)< 100 GB/daySmall to medium deployments
Simple Scalable100 GB - 1 TB/dayRead/write path separation
Microservices> 1 TB/dayLarge-scale production

Best Practices

  1. Use labels sparingly: Loki indexes labels, not log content. Keep cardinality low (under 100,000 unique label combinations). Use log pipeline filters in LogQL instead of creating high-cardinality labels like user IDs or request IDs.

  2. Structure your logs as JSON: Emit logs as JSON from your applications. This enables LogQL's | json parser to extract fields at query time without requiring additional labels.

  3. Use the TSDB index store: For new deployments, always use store: tsdb with schema: v13. This is significantly more efficient than the older BoltDB index.

  4. Configure retention policies: Enable compactor retention to automatically delete old logs. Without retention, storage grows unbounded and costs escalate.

  5. Deploy Grafana Alloy instead of Promtail: Alloy is the actively maintained log collector that replaces Promtail. It supports the same features plus OpenTelemetry ingestion and processing pipelines.

  6. Separate read and write paths at scale: For deployments exceeding 100GB/day, use Simple Scalable mode to independently scale readers and writers.

  7. Use object storage in production: Filesystem storage is suitable only for development. In production, use S3, GCS, or Azure Blob Storage for durability and scalability.

  8. Set ingestion limits per tenant: Configure ingestion_rate_mb and ingestion_burst_size_mb to prevent runaway logging from one application or tenant from impacting the entire cluster.

  9. Correlate logs with traces and metrics: Use Grafana's data source correlation feature to link Loki logs with Tempo traces and Prometheus metrics for unified observability dashboards.

  10. Monitor Loki itself: Loki exposes Prometheus metrics on /metrics. Monitor ingestion rate, query latency, and storage usage to detect issues before they impact log availability.

Troubleshooting

Logs not appearing in Grafana Verify Loki is receiving logs by querying curl http://localhost:3100/ready. Check Alloy logs for push errors. Ensure the Grafana data source URL points to the correct Loki endpoint. Confirm labels match your LogQL query.

Query timeout on large time ranges Reduce the query time range or add more specific label selectors to narrow the stream. Enable query splitting via split_queries_by_interval in the query frontend. Consider increasing max_query_length limits.

High cardinality warnings Review your labels and remove any with high cardinality (unique values > 10,000). Common offenders are request IDs, user IDs, and IP addresses. Move these to structured log fields and query them with LogQL parsers.

Out-of-order log entries rejected Configure unordered_writes: true in the ingester config if your log sources cannot guarantee ordering. Alternatively, increase max_chunk_age to tolerate minor timestamp variations.

Storage costs growing unexpectedly Enable retention in the compactor. Review ingestion rates per tenant with loki_distributor_bytes_received_total metric. Identify noisy applications and add rate limits or reduce their log verbosity.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates