Expert Elasticsearch Observability
Enterprise-grade agent for expert, assistant, debugging, code. Includes structured workflows, validation checks, and reusable patterns for security.
Expert Elasticsearch Observability
Leverage Elasticsearch, ES|QL, and the Elastic Stack for logs, metrics, APM traces, and security event analysis at scale.
When to Use This Agent
Choose this agent when you need to:
- Write or optimize ES|QL and DSL queries for log analysis, APM trace investigation, or SIEM alert triage
- Design ILM policies, data streams, and ingest pipelines balancing query performance with storage cost
- Implement semantic vector search or hybrid RAG patterns using Elasticsearch with ELSER or embedding models
Consider alternatives when:
- Your stack is Dynatrace, Datadog, or Splunk and you need vendor-specific guidance
- You need full ML training pipelines rather than inference-time search and retrieval
Quick Start
Configuration
name: expert-elasticsearch-observability type: agent category: observability
Example Invocation
claude agent:invoke expert-elasticsearch-observability "Find top 10 error-producing services in the last 24 hours"
Example Output
ES|QL Query - Top Errors (24h)
FROM logs-*
| WHERE @timestamp > NOW() - 24 hours AND log.level == "ERROR"
| STATS error_count = COUNT(*), unique_errors = COUNT_DISTINCT(error.message)
BY service.name
| SORT error_count DESC
| LIMIT 10
Results:
payment-gateway 2,847 14
user-auth-service 1,203 8
inventory-sync 891 22
Core Concepts
Elastic Observability Overview
| Aspect | Details |
|---|---|
| ES | QL |
| Data streams | Time-series index abstraction with automatic rollover and append-only semantics for logs, metrics, traces |
| Ingest pipelines | Server-side processing chains (grok, dissect, date, geoip, enrich) for parsing and enriching before indexing |
| APM integration | Elastic APM agents collect distributed traces, transaction metrics, and errors viewable in Kibana APM UI |
| Vector search | ELSER sparse encoder and dense-vector kNN search enabling hybrid scoring for RAG and relevance tuning |
Elastic Data Flow Architecture
+----------------+ +------------------+ +----------------+
| Data Sources | --> | Ingest Pipeline | --> | Data Streams |
| (Beats, Agents | | (Grok, Enrich, | | (logs-*, metrics|
| APM, Fleet) | | Transform) | | -*, traces-*) |
+----------------+ +------------------+ +----------------+
| | |
v v v
+----------------+ +------------------+ +----------------+
| ILM Policies | --> | ES|QL / DSL | --> | Kibana Dash, |
| (Hot/Warm/Cold)| | Queries | | Alerts, APM UI |
+----------------+ +------------------+ +----------------+
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| elasticsearch_url | string | - | Cluster endpoint URL |
| api_key | string | - | API key with read access to data streams and indices |
| default_index_pattern | string | logs-* | Default data stream or index pattern for queries |
| ilm_hot_retention | string | 7d | Duration on hot-tier nodes before warm transition |
| esql_row_limit | int | 10000 | Maximum rows returned by ES |
Best Practices
-
Prefer ES|QL for exploration - Pipe syntax is more readable than nested JSON DSL for ad-hoc investigation. Start with FROM, pipe through WHERE, then STATS.
-
Design mappings before ingestion - Define explicit field types in component templates before first documents land. Dynamic mapping on high-cardinality fields causes mapping explosions.
-
Implement tiered ILM - Hot SSD for 0-7 days serves real-time alerting; frozen tier for 30-365 days satisfies compliance retention at minimal cost.
-
Use runtime fields for schema evolution - Define runtime fields instead of reindexing historical data when new log formats arrive, enabling schema changes without downtime.
-
Enrich at ingest time - Join lookup data (GeoIP, asset inventory) during indexing to enable faster aggregations and eliminate repeated query-time join overhead.
Common Issues
-
Mapping explosion - Uncontrolled dynamic mapping exceeds the 1,000 field limit. Set dynamic to "strict" or "runtime" and define explicit mappings.
-
ES|QL memory limits - Aggregating millions of unique values hits circuit breakers. Add tighter time filters or reduce cardinality with EVAL bucketing.
-
Stale ILM transitions - Misconfigured policies leave indices stuck. Check ILM explain API and verify target tiers have sufficient disk.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.