Expert Elasticsearch Observability

Leverage Elasticsearch, ES|QL, and the Elastic Stack for logs, metrics, APM traces, and security event analysis at scale.

When to Use This Agent

Choose this agent when you need to:

Write or optimize ES|QL and DSL queries for log analysis, APM trace investigation, or SIEM alert triage
Design ILM policies, data streams, and ingest pipelines balancing query performance with storage cost
Implement semantic vector search or hybrid RAG patterns using Elasticsearch with ELSER or embedding models

Consider alternatives when:

Your stack is Dynatrace, Datadog, or Splunk and you need vendor-specific guidance
You need full ML training pipelines rather than inference-time search and retrieval

Quick Start

Configuration


name: expert-elasticsearch-observability
type: agent
category: observability

Example Invocation


claude agent:invoke expert-elasticsearch-observability "Find top 10 error-producing services in the last 24 hours"

Example Output

ES|QL Query - Top Errors (24h)
FROM logs-*
| WHERE @timestamp > NOW() - 24 hours AND log.level == "ERROR"
| STATS error_count = COUNT(*), unique_errors = COUNT_DISTINCT(error.message)
    BY service.name
| SORT error_count DESC
| LIMIT 10

Results:
  payment-gateway      2,847    14
  user-auth-service    1,203     8
  inventory-sync         891    22

Core Concepts

Elastic Observability Overview

Aspect	Details
ES	QL
Data streams	Time-series index abstraction with automatic rollover and append-only semantics for logs, metrics, traces
Ingest pipelines	Server-side processing chains (grok, dissect, date, geoip, enrich) for parsing and enriching before indexing
APM integration	Elastic APM agents collect distributed traces, transaction metrics, and errors viewable in Kibana APM UI
Vector search	ELSER sparse encoder and dense-vector kNN search enabling hybrid scoring for RAG and relevance tuning

Elastic Data Flow Architecture

+----------------+     +------------------+     +----------------+
| Data Sources   | --> | Ingest Pipeline  | --> | Data Streams   |
| (Beats, Agents |     | (Grok, Enrich,   |     | (logs-*, metrics|
|  APM, Fleet)   |     |  Transform)      |     |  -*, traces-*) |
+----------------+     +------------------+     +----------------+
        |                       |                       |
        v                       v                       v
+----------------+     +------------------+     +----------------+
| ILM Policies   | --> | ES|QL / DSL      | --> | Kibana Dash,   |
| (Hot/Warm/Cold)|     | Queries          |     | Alerts, APM UI |
+----------------+     +------------------+     +----------------+

Configuration

Parameter	Type	Default	Description
elasticsearch_url	string	-	Cluster endpoint URL
api_key	string	-	API key with read access to data streams and indices
default_index_pattern	string	logs-*	Default data stream or index pattern for queries
ilm_hot_retention	string	7d	Duration on hot-tier nodes before warm transition
esql_row_limit	int	10000	Maximum rows returned by ES

Best Practices

Prefer ES|QL for exploration - Pipe syntax is more readable than nested JSON DSL for ad-hoc investigation. Start with FROM, pipe through WHERE, then STATS.
Design mappings before ingestion - Define explicit field types in component templates before first documents land. Dynamic mapping on high-cardinality fields causes mapping explosions.
Implement tiered ILM - Hot SSD for 0-7 days serves real-time alerting; frozen tier for 30-365 days satisfies compliance retention at minimal cost.
Use runtime fields for schema evolution - Define runtime fields instead of reindexing historical data when new log formats arrive, enabling schema changes without downtime.
Enrich at ingest time - Join lookup data (GeoIP, asset inventory) during indexing to enable faster aggregations and eliminate repeated query-time join overhead.

Common Issues

Mapping explosion - Uncontrolled dynamic mapping exceeds the 1,000 field limit. Set dynamic to "strict" or "runtime" and define explicit mappings.
ES|QL memory limits - Aggregating millions of unique values hits circuit breakers. Add tighter time filters or reduce cardinality with EVAL bucketing.
Stale ILM transitions - Misconfigured policies leave indices stuck. Check ILM explain API and verify target tiers have sufficient disk.

⚠️ Loading Issue

Expert Elasticsearch Observability

Expert Elasticsearch Observability

When to Use This Agent

Quick Start

Configuration

Example Invocation

Example Output

Core Concepts

Elastic Observability Overview

Elastic Data Flow Architecture

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner