Amplitude Experiment Mentor
Streamline your workflow with this custom, agent, uses, amplitude. Includes structured workflows, validation checks, and reusable patterns for data ai.
Amplitude Experiment Mentor
An agent that guides feature experiment implementation from planning through analysis, helping teams design proper A/B tests, instrument tracking, and interpret results using Amplitude's experimentation platform.
When to Use This Agent
Choose Amplitude Experiment Mentor when:
- Setting up feature flags and A/B tests in Amplitude Experiment
- Designing experiment instrumentation and tracking plans
- Implementing feature flag checks in application code
- Analyzing experiment results and making ship/no-ship decisions
- Building a culture of data-driven feature development
Consider alternatives when:
- Using a different experimentation platform like LaunchDarkly or Optimizely
- Running infrastructure experiments without user-facing changes (use canary deployments)
- Doing qualitative user research without quantitative testing (use a UX research agent)
Quick Start
# .claude/agents/amplitude-experiment-mentor.yml name: Amplitude Experiment Mentor model: claude-sonnet-4-20250514 tools: - Read - Write - Bash - Grep prompt: | You are an experimentation expert using Amplitude Experiment. Guide teams through experiment design, implementation, instrumentation, and analysis. Ensure statistical rigor and actionable results.
Example invocation:
claude --agent amplitude-experiment-mentor "Help me set up an A/B test for our new checkout flow. We want to measure conversion rate impact with 95% confidence."
Core Concepts
Experiment Lifecycle
| Phase | Activities | Key Outputs |
|---|---|---|
| Design | Hypothesis, metrics, sample size | Experiment plan document |
| Setup | Feature flags, variant config | Amplitude Experiment config |
| Instrument | Event tracking, properties | Analytics implementation |
| QA | Flag verification, event validation | Test report |
| Run | Traffic allocation, monitoring | Live dashboard |
| Analyze | Statistical analysis, segmentation | Results and recommendation |
| Decision | Ship, iterate, or kill | Documented decision |
Feature Flag Implementation
import { Experiment } from '@amplitude/experiment-js-client'; // Initialize the client const experiment = Experiment.initialize('YOUR_API_KEY', { automaticExposureTracking: true, }); // Fetch variants for the user await experiment.fetch({ user_id: userId }); // Check the variant const variant = experiment.variant('new-checkout-flow'); if (variant.value === 'treatment') { renderNewCheckout(); } else { renderCurrentCheckout(); }
Sample Size Calculator
Required Sample = (2 × (Z_α + Z_β)² × σ²) / δ²
Where:
Z_α = 1.96 (for 95% confidence)
Z_β = 0.84 (for 80% power)
σ = baseline standard deviation
δ = minimum detectable effect
Example: Baseline conversion = 5%, MDE = 0.5%
Sample per variant ≈ 31,234 users
Total sample ≈ 62,468 users
At 1,000 users/day → ~63 days to reach significance
Configuration
| Parameter | Description | Default |
|---|---|---|
confidence_level | Statistical confidence threshold | 95% |
power | Statistical power for sample sizing | 80% |
min_runtime_days | Minimum days before concluding | 7 |
traffic_allocation | Percentage of users in experiment | 100% |
sticky_bucketing | Maintain consistent user assignment | true |
exposure_tracking | Auto-track variant exposures | true |
rollout_strategy | Gradual rollout after ship decision | 10% → 50% → 100% |
Best Practices
-
Define your primary metric before writing any code. Every experiment needs exactly one primary metric that determines the ship/no-ship decision. Secondary metrics provide context but should not override the primary. If you cannot articulate what success looks like in one metric, your experiment scope is too broad.
-
Run experiments for full business cycles. A checkout experiment that runs Monday through Thursday misses weekend shopping patterns. Always run for at least one full week, ideally two. Day-of-week effects, payroll cycles, and seasonal patterns can all bias results if your experiment window doesn't cover them.
-
Instrument both the exposure and the outcome. Track when users see the variant (exposure event) and when they complete the target action (outcome event). Without exposure tracking, your analysis includes users who were assigned to a variant but never encountered the changed experience, diluting the measured effect.
-
Guard against peeking at results prematurely. Checking results daily and stopping when you see significance inflates your false positive rate dramatically. Use sequential testing methods if you need to monitor continuously, or pre-commit to a fixed sample size and analysis date. Amplitude's statistics engine accounts for this, but only if configured correctly.
-
Document the decision, not just the results. After analysis, record what you decided and why in a shared location. Include the metrics observed, the confidence level achieved, qualitative factors that influenced the decision, and any follow-up experiments planned. This institutional memory prevents re-running the same experiments and helps new team members understand product evolution.
Common Issues
Experiment shows no statistical significance after the planned runtime. This usually means your minimum detectable effect was too small for your traffic volume. Before extending the experiment, check for implementation bugs (are users actually seeing different experiences?), verify event tracking fires correctly, and confirm the primary metric is sensitive to the change. If everything checks out, you likely need either more traffic or a larger design change to move the needle.
Users see different variants across sessions or devices. Sticky bucketing prevents this by storing variant assignments persistently. Configure Amplitude to use a stable user identifier rather than anonymous session IDs. For logged-out experiences, use a device ID cookie. Cross-device consistency requires account-level bucketing, which means the experiment can only include logged-in users.
Results are significant but the effect seems implausibly large. Check for novelty effects by segmenting results by week. If the treatment effect decreases over time, users may be reacting to newness rather than genuine improvement. Also verify that variant assignment is balanced—a skewed split can inflate effect sizes. Run a pre-experiment A/A test on your bucketing logic to confirm uniform distribution.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.