E

Expert Elasticsearch Observability

Enterprise-grade agent for expert, assistant, debugging, code. Includes structured workflows, validation checks, and reusable patterns for security.

AgentClipticssecurityv1.0.0MIT
0 views0 copies

Expert Elasticsearch Observability

Leverage Elasticsearch, ES|QL, and the Elastic Stack for logs, metrics, APM traces, and security event analysis at scale.

When to Use This Agent

Choose this agent when you need to:

  • Write or optimize ES|QL and DSL queries for log analysis, APM trace investigation, or SIEM alert triage
  • Design ILM policies, data streams, and ingest pipelines balancing query performance with storage cost
  • Implement semantic vector search or hybrid RAG patterns using Elasticsearch with ELSER or embedding models

Consider alternatives when:

  • Your stack is Dynatrace, Datadog, or Splunk and you need vendor-specific guidance
  • You need full ML training pipelines rather than inference-time search and retrieval

Quick Start

Configuration

name: expert-elasticsearch-observability type: agent category: observability

Example Invocation

claude agent:invoke expert-elasticsearch-observability "Find top 10 error-producing services in the last 24 hours"

Example Output

ES|QL Query - Top Errors (24h)
FROM logs-*
| WHERE @timestamp > NOW() - 24 hours AND log.level == "ERROR"
| STATS error_count = COUNT(*), unique_errors = COUNT_DISTINCT(error.message)
    BY service.name
| SORT error_count DESC
| LIMIT 10

Results:
  payment-gateway      2,847    14
  user-auth-service    1,203     8
  inventory-sync         891    22

Core Concepts

Elastic Observability Overview

AspectDetails
ESQL
Data streamsTime-series index abstraction with automatic rollover and append-only semantics for logs, metrics, traces
Ingest pipelinesServer-side processing chains (grok, dissect, date, geoip, enrich) for parsing and enriching before indexing
APM integrationElastic APM agents collect distributed traces, transaction metrics, and errors viewable in Kibana APM UI
Vector searchELSER sparse encoder and dense-vector kNN search enabling hybrid scoring for RAG and relevance tuning

Elastic Data Flow Architecture

+----------------+     +------------------+     +----------------+
| Data Sources   | --> | Ingest Pipeline  | --> | Data Streams   |
| (Beats, Agents |     | (Grok, Enrich,   |     | (logs-*, metrics|
|  APM, Fleet)   |     |  Transform)      |     |  -*, traces-*) |
+----------------+     +------------------+     +----------------+
        |                       |                       |
        v                       v                       v
+----------------+     +------------------+     +----------------+
| ILM Policies   | --> | ES|QL / DSL      | --> | Kibana Dash,   |
| (Hot/Warm/Cold)|     | Queries          |     | Alerts, APM UI |
+----------------+     +------------------+     +----------------+

Configuration

ParameterTypeDefaultDescription
elasticsearch_urlstring-Cluster endpoint URL
api_keystring-API key with read access to data streams and indices
default_index_patternstringlogs-*Default data stream or index pattern for queries
ilm_hot_retentionstring7dDuration on hot-tier nodes before warm transition
esql_row_limitint10000Maximum rows returned by ES

Best Practices

  1. Prefer ES|QL for exploration - Pipe syntax is more readable than nested JSON DSL for ad-hoc investigation. Start with FROM, pipe through WHERE, then STATS.

  2. Design mappings before ingestion - Define explicit field types in component templates before first documents land. Dynamic mapping on high-cardinality fields causes mapping explosions.

  3. Implement tiered ILM - Hot SSD for 0-7 days serves real-time alerting; frozen tier for 30-365 days satisfies compliance retention at minimal cost.

  4. Use runtime fields for schema evolution - Define runtime fields instead of reindexing historical data when new log formats arrive, enabling schema changes without downtime.

  5. Enrich at ingest time - Join lookup data (GeoIP, asset inventory) during indexing to enable faster aggregations and eliminate repeated query-time join overhead.

Common Issues

  1. Mapping explosion - Uncontrolled dynamic mapping exceeds the 1,000 field limit. Set dynamic to "strict" or "runtime" and define explicit mappings.

  2. ES|QL memory limits - Aggregating millions of unique values hits circuit breakers. Add tighter time filters or reduce cardinality with EVAL bucketing.

  3. Stale ILM transitions - Misconfigured policies leave indices stuck. Check ILM explain API and verify target tiers have sufficient disk.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates