Ultimate Loki Framework

Overview

Grafana Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system designed to be cost-effective and easy to operate. Inspired by Prometheus, Loki indexes only metadata labels rather than the full text of log lines, making it 10-100x more storage-efficient than traditional log aggregation systems like Elasticsearch. Log data is compressed and stored in chunks in object storage (S3, GCS, Azure Blob) or on the local filesystem, while a small index tracks label sets and time ranges. Loki integrates natively with Grafana for visualization and alerting, uses LogQL (a PromQL-inspired query language) for searching and filtering logs, and collects logs via Grafana Alloy (the successor to Promtail), Fluentd, Fluent Bit, or any OpenTelemetry-compatible agent. Whether you are running a Kubernetes cluster or bare-metal servers, Loki provides a unified logging backend that scales from single-node deployments to multi-tenant clusters handling terabytes of logs daily.

When to Use

Kubernetes log aggregation: Collect and query logs from all pods with automatic label extraction from Kubernetes metadata.
Cost-effective log storage: When Elasticsearch or Splunk costs are prohibitive and you do not need full-text indexing on every log line.
Prometheus-native teams: If your team already uses Prometheus and Grafana, Loki provides a natural extension for logs with familiar concepts.
Multi-tenant logging: Serve multiple teams or customers from a single Loki cluster with tenant isolation.
Alerting on log patterns: Define alert rules based on log patterns, error rates, or specific log line content using LogQL.
Correlation with metrics and traces: Link logs to Prometheus metrics and Tempo traces in Grafana dashboards for unified observability.

Quick Start

Docker Compose Deployment


# docker-compose.yaml
version: "3.8"

services:
  loki:
    image: grafana/loki:3.3.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/config.yaml

  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.alloy:/etc/alloy/config.alloy
      - /var/log:/var/log:ro
    command: run /etc/alloy/config.alloy
    depends_on:
      - loki

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki

volumes:
  loki-data:
  grafana-data:

Minimal Loki Configuration


# loki-config.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  ring:
    kvstore:
      store: inmemory
    replication_factor: 1
  path_prefix: /loki

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32

Send Test Logs


# Push a log entry via the Loki API
curl -X POST http://localhost:3100/loki/api/v1/push \
  -H "Content-Type: application/json" \
  -d '{
    "streams": [{
      "stream": {"app": "test", "env": "dev"},
      "values": [
        ["'"$(date +%s)000000000"'", "Hello from Loki!"]
      ]
    }]
  }'

# Query logs via LogQL
curl -G http://localhost:3100/loki/api/v1/query_range \
  --data-urlencode 'query={app="test"}' \
  --data-urlencode 'limit=10'

Core Concepts

LogQL Query Language

LogQL is Loki's query language, inspired by PromQL. It operates on log streams selected by labels:


# Basic stream selection
{app="frontend", env="production"}

# Filter log lines containing "error"
{app="frontend"} |= "error"

# Exclude lines containing "healthcheck"
{app="frontend"} != "healthcheck"

# Regex filter
{app="frontend"} |~ "status=(4|5)\\d{2}"

# Parse structured logs with logfmt
{app="api"} | logfmt | duration > 500ms

# Parse JSON logs
{app="api"} | json | response_code >= 400

# Combine filters and parsing
{app="api", env="production"}
  |= "error"
  | json
  | line_format "{{.timestamp}} [{{.level}}] {{.message}}"

# Metric queries (rates and aggregations)
rate({app="api"} |= "error" [5m])

# Top 5 apps by error rate
topk(5, sum by(app) (rate({env="production"} |= "error" [5m])))

# Quantile of parsed durations
quantile_over_time(0.95, {app="api"} | logfmt | unwrap duration [5m])

Grafana Alloy Configuration


// alloy-config.alloy
// Collect logs from files
local.file_match "logs" {
  path_targets = [
    {__path__ = "/var/log/app/*.log", app = "myapp", env = "production"},
  ]
}

loki.source.file "files" {
  targets    = local.file_match.logs.targets
  forward_to = [loki.write.default.receiver]
}

// Kubernetes pod log collection
discovery.kubernetes "pods" {
  role = "pod"
}

loki.source.kubernetes "pods" {
  targets    = discovery.kubernetes.pods.targets
  forward_to = [loki.process.pipeline.receiver]
}

// Processing pipeline: parse and label
loki.process "pipeline" {
  stage.json {
    expressions = {level = "level", msg = "message"}
  }
  stage.labels {
    values = {level = ""}
  }
  forward_to = [loki.write.default.receiver]
}

// Write to Loki
loki.write "default" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

Kubernetes Helm Deployment


# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Install Loki in monolithic mode (recommended for < 100GB/day)
helm install loki grafana/loki \
  --namespace loki --create-namespace \
  --set loki.auth_enabled=false \
  --set singleBinary.replicas=1 \
  --set loki.storage.type=s3 \
  --set loki.storage.s3.endpoint=s3.amazonaws.com \
  --set loki.storage.s3.region=us-east-1 \
  --set loki.storage.s3.bucketnames=my-loki-bucket \
  --set loki.storage.s3.access_key_id=$AWS_ACCESS_KEY \
  --set loki.storage.s3.secret_access_key=$AWS_SECRET_KEY

# Install Alloy for log collection
helm install alloy grafana/alloy \
  --namespace loki \
  --set alloy.configMap.content="$(cat alloy-config.alloy)"

Alert Rules


# loki-rules.yaml
groups:
  - name: application-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({app="api"} |= "error" [5m])) by (app)
          /
          sum(rate({app="api"} [5m])) by (app)
          > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate for {{ $labels.app }}"

      - alert: SlowResponses
        expr: |
          quantile_over_time(0.95,
            {app="api"} | logfmt | unwrap duration [5m]
          ) > 2000
        for: 10m
        labels:
          severity: warning

Configuration Reference

Parameter	Description	Recommended Value
`auth_enabled`	Multi-tenant authentication	`false` (single-tenant), `true` (multi)
`schema_config.store`	Index store type	`tsdb` (recommended for v3+)
`schema_config.object_store`	Chunk storage backend	`s3`, `gcs`, `azure`, `filesystem`
`limits_config.ingestion_rate_mb`	Per-tenant ingestion rate limit	`16` MB/s
`limits_config.ingestion_burst_size_mb`	Burst ingestion limit	`32` MB
`limits_config.max_query_length`	Maximum query time range	`721h`
`limits_config.reject_old_samples_max_age`	Reject logs older than this	`168h` (7 days)
`compactor.retention_enabled`	Enable log retention/deletion	`true`
`compactor.retention_period`	How long to keep logs	`744h` (31 days)

Deployment Modes

Mode	Scale	When to Use
Monolithic (single binary)	< 100 GB/day	Small to medium deployments
Simple Scalable	100 GB - 1 TB/day	Read/write path separation
Microservices	> 1 TB/day	Large-scale production

Best Practices

Use labels sparingly: Loki indexes labels, not log content. Keep cardinality low (under 100,000 unique label combinations). Use log pipeline filters in LogQL instead of creating high-cardinality labels like user IDs or request IDs.
Structure your logs as JSON: Emit logs as JSON from your applications. This enables LogQL's | json parser to extract fields at query time without requiring additional labels.
Use the TSDB index store: For new deployments, always use store: tsdb with schema: v13. This is significantly more efficient than the older BoltDB index.
Configure retention policies: Enable compactor retention to automatically delete old logs. Without retention, storage grows unbounded and costs escalate.
Deploy Grafana Alloy instead of Promtail: Alloy is the actively maintained log collector that replaces Promtail. It supports the same features plus OpenTelemetry ingestion and processing pipelines.
Separate read and write paths at scale: For deployments exceeding 100GB/day, use Simple Scalable mode to independently scale readers and writers.
Use object storage in production: Filesystem storage is suitable only for development. In production, use S3, GCS, or Azure Blob Storage for durability and scalability.
Set ingestion limits per tenant: Configure ingestion_rate_mb and ingestion_burst_size_mb to prevent runaway logging from one application or tenant from impacting the entire cluster.
Correlate logs with traces and metrics: Use Grafana's data source correlation feature to link Loki logs with Tempo traces and Prometheus metrics for unified observability dashboards.
Monitor Loki itself: Loki exposes Prometheus metrics on /metrics. Monitor ingestion rate, query latency, and storage usage to detect issues before they impact log availability.

Troubleshooting

Logs not appearing in Grafana Verify Loki is receiving logs by querying curl http://localhost:3100/ready. Check Alloy logs for push errors. Ensure the Grafana data source URL points to the correct Loki endpoint. Confirm labels match your LogQL query.

Query timeout on large time ranges Reduce the query time range or add more specific label selectors to narrow the stream. Enable query splitting via split_queries_by_interval in the query frontend. Consider increasing max_query_length limits.

High cardinality warnings Review your labels and remove any with high cardinality (unique values > 10,000). Common offenders are request IDs, user IDs, and IP addresses. Move these to structured log fields and query them with LogQL parsers.

Out-of-order log entries rejected Configure unordered_writes: true in the ingester config if your log sources cannot guarantee ordering. Alternatively, increase max_chunk_age to tolerate minor timestamp variations.

Storage costs growing unexpectedly Enable retention in the compactor. Review ingestion rates per tenant with loki_distributor_bytes_received_total metric. Identify noisy applications and add rate limits or reduce their log verbosity.

⚠️ Loading Issue

Ultimate Loki Framework

Ultimate Loki Framework

Overview

When to Use

Quick Start

Docker Compose Deployment

Minimal Loki Configuration

Send Test Logs

Core Concepts

LogQL Query Language

Grafana Alloy Configuration

Kubernetes Helm Deployment

Alert Rules

Configuration Reference

Deployment Modes

Best Practices

Troubleshooting

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace