Pro Metrics Workspace

A Railway-focused skill for monitoring application metrics, resource usage, and performance data across Railway services. Pro Metrics Workspace helps you track CPU, memory, network throughput, and request latency to identify bottlenecks and optimize resource allocation.

When to Use This Skill

Choose Pro Metrics Workspace when:

Monitoring resource consumption across Railway services
Investigating performance issues or unexpected resource spikes
Right-sizing service instances based on actual usage data
Setting up alerting thresholds for resource limits

Consider alternatives when:

You need application-level performance monitoring (use Datadog, New Relic)
You're tracking business metrics (use analytics tools)
You need distributed tracing (use Jaeger, Zipkin, or OpenTelemetry)

Quick Start


claude "Show me resource usage for my Railway services"


# Check current service status and resource usage
railway status

# View recent logs for performance indicators
railway logs --tail

# Check service metrics via Railway dashboard API
# Navigate to: railway.app/project/<id>/service/<id>/metrics


// Custom metrics endpoint for application monitoring
app.get('/metrics', (req, res) => {
  const metrics = {
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    cpu: process.cpuUsage(),
    requestCount: globalMetrics.requestCount,
    avgResponseTime: globalMetrics.totalResponseTime / globalMetrics.requestCount,
    activeConnections: globalMetrics.activeConnections,
    errorRate: globalMetrics.errorCount / globalMetrics.requestCount
  };
  res.json(metrics);
});

Core Concepts

Railway Resource Metrics

Metric	Description	Healthy Range
CPU Usage	Processing utilization	< 70% sustained
Memory (RSS)	Resident memory usage	< 80% of limit
Network In	Incoming traffic bytes	Varies by service
Network Out	Outgoing traffic bytes	Varies by service
Disk Usage	Persistent volume consumption	< 85% capacity
Request Latency	P50/P95/P99 response times	P95 < 500ms

Application-Level Metrics


// Middleware for tracking request metrics
const metricsMiddleware = (req, res, next) => {
  const start = Date.now();
  metrics.activeConnections++;

  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.requestCount++;
    metrics.totalResponseTime += duration;
    metrics.activeConnections--;

    if (res.statusCode >= 500) {
      metrics.errorCount++;
    }

    // Log slow requests
    if (duration > 1000) {
      console.warn(`Slow request: ${req.method} ${req.path} - ${duration}ms`);
    }
  });

  next();
};

Resource Scaling Guide


## Scaling Decision Matrix

| Symptom | Metric | Action |
|---------|--------|--------|
| Slow responses | CPU > 80% | Increase CPU or add replicas |
| Out of memory crashes | Memory > 90% | Increase memory limit |
| Connection timeouts | Active conns > pool max | Increase pool or add replicas |
| Disk full errors | Disk > 90% | Increase volume or clean data |
| High error rate | 5xx > 1% | Check logs, scale, or fix code |

Configuration

Parameter	Description	Default
`metrics_endpoint`	Custom metrics HTTP path	`/metrics`
`collection_interval`	How often to sample metrics	`60s`
`retention_period`	How long metrics data is kept	`7 days`
`alert_cpu_threshold`	CPU usage warning level	`80%`
`alert_memory_threshold`	Memory usage warning level	`85%`

Best Practices

Monitor memory trends, not just snapshots. A service using 60% memory looks fine, but if it's growing 5% per hour, you have a memory leak. Track memory over time and investigate upward trends before they cause OOM crashes.
Set up a custom /metrics endpoint. Railway's built-in metrics cover infrastructure, but application metrics (request count, error rate, queue depth) require instrumentation in your code. Expose them via a dedicated endpoint.
Use P95/P99 latency instead of averages. Average response time hides outliers. If your average is 100ms but P99 is 5 seconds, 1% of users are having a terrible experience. Track percentiles to understand the full latency distribution.
Right-size resources after collecting baseline data. Don't guess CPU and memory limits — deploy with generous limits, monitor actual usage for a week, then set limits at 1.5x the observed peak. This prevents waste without risking OOM kills.
Correlate metrics with deployment events. When metrics change suddenly, check if a deployment happened around the same time. Track deployment timestamps alongside metrics to quickly identify whether code changes caused performance regressions.

Common Issues

Memory usage grows until the service crashes. This indicates a memory leak — common causes include unclosed database connections, growing caches without eviction, and event listener accumulation. Use --inspect flag with Node.js to capture heap snapshots and identify leaking objects.

CPU spikes during specific time periods. Check for cron jobs, scheduled tasks, or traffic patterns that coincide with the spikes. If a background job causes CPU contention with request handling, move it to a separate Railway service with its own resources.

Metrics show healthy resources but users report slowness. The bottleneck may be outside your Railway service — DNS resolution, external API calls, or database queries. Add timing instrumentation to external calls and check for slow queries in your database logs.

⚠️ Loading Issue

Pro Metrics Workspace

Pro Metrics Workspace

When to Use This Skill

Quick Start

Core Concepts

Railway Resource Metrics

Application-Level Metrics

Resource Scaling Guide

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace