Dynatrace Expert Partner
Production-ready agent that handles dynatrace, expert, agent, integrates. Includes structured workflows, validation checks, and reusable patterns for security.
Dynatrace Expert Partner
Master Dynatrace observability, APM, and DQL analytics for incident response, capacity planning, and security posture monitoring.
When to Use This Agent
Choose this agent when you need to:
- Investigate production incidents using distributed traces, service-flow analysis, and Davis AI to pinpoint root cause
- Write or optimize DQL queries for custom dashboards, SLO definitions, and alerting rules across full-stack environments
- Assess application security through Dynatrace RASP, vulnerability detection, and attack-path analysis
Consider alternatives when:
- Your monitoring stack is Prometheus, Grafana, or Datadog and you need vendor-specific guidance
- You require code-level profiling beyond what OneAgent captures automatically
Quick Start
Configuration
name: dynatrace-expert-partner type: agent category: observability
Example Invocation
claude agent:invoke dynatrace-expert-partner "Investigate checkout service latency spike and build a DQL dashboard"
Example Output
Incident - Checkout Service Latency
Environment: prod-us-east | 2026-03-15 08:00-09:30 UTC
Root Cause: Davis AI anomaly at 08:12 UTC
P95 latency: 180ms -> 2,400ms
Cause: DB connection pool exhaustion (100/100 saturated)
Trigger: checkout-api:v3.8.2 deployed at 08:10 (missing connection release)
DQL Query:
timeseries avg_latency = avg(dt.service.request.response_time),
filter: dt.entity.service.name == "checkout-api", interval: 1m
| fieldsAdd threshold = 500
Core Concepts
Dynatrace Observability Overview
| Aspect | Details |
|---|---|
| OneAgent | Auto code-level injection for Java, .NET, Node.js, Go, Python providing traces, hotspots, and RUM |
| Davis AI | Causal AI correlating topology, metrics, events, and logs to identify root cause and impact scope |
| DQL | Pipe-based query language for logs, metrics, events, entities with fetch, filter, summarize, timeseries |
| Smartscape | Real-time dependency map spanning hosts, processes, services, and applications with call relationships |
| Grail | Unified data lakehouse for all signals with schema-on-read and retention up to 10 years |
Dynatrace Investigation Architecture
+----------------+ +------------------+ +----------------+
| OneAgent | --> | Grail Data | --> | Davis AI |
| Instrumentation| | Lakehouse | | Correlation |
+----------------+ +------------------+ +----------------+
| | |
v v v
+----------------+ +------------------+ +----------------+
| Smartscape | --> | DQL Queries & | --> | Dashboards & |
| Topology | | Notebooks | | Alerts / SLOs |
+----------------+ +------------------+ +----------------+
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
| dt_environment_url | string | - | Environment URL (e.g., https://abc12345.live.dynatrace.com) |
| dt_api_token | string | - | API token with Read metrics, entities, logs, problems scopes |
| default_timeframe | string | last 2 hours | Default query timeframe for investigations |
| management_zone | string | - | Zone to scope analysis to a specific application or team |
| slo_target | float | 99.9 | Default SLO availability target percentage |
Best Practices
-
Start with Davis AI problems - Query the problems API first instead of raw metrics. Davis performs topology-aware root cause analysis that would take hours manually.
-
Use management zones for scoping - Scope dashboards and alerts to specific zones so teams see only their services, reducing noise and improving query performance.
-
Build DQL queries iteratively - Start with broad fetch, add filters, then aggregations. Test each stage in a Notebook before embedding in dashboard tiles.
-
Define SLOs before incidents - Establish latency, error rate, and availability objectives during calm periods. Error budget burn rate provides objective severity measurement.
-
Correlate deployments with anomalies - Ingest CI/CD deployment events so Davis considers them as root-cause candidates, frequently reducing MTTR.
Common Issues
-
DQL query timeouts - Queries on large datasets exceed execution limits. Add entity filters, reduce timeframes, or use larger summarize intervals.
-
OneAgent version mismatch - Different major versions produce inconsistent traces. Use the deployment API to automate rolling upgrades.
-
Alert fatigue from defaults - Tune Davis sensitivity per service and define metric-based alert profiles tied to SLO budgets.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.