The Contract That Opened My Eyes
In my First Year of Engineering, I got my first monitoring contract. A startup founder reached out site was growing fast, and he had zero visibility into what was happening. No idea which pages were slow, how many users were active, or why their API crashed every Tuesday at 3 PM.
Six hours later, I had Prometheus and Grafana running, collecting metrics, and showing beautiful dashboards. The founder was ecstatic. I got paid $800. More importantly, I realized something: monitoring is a skill every backend developer should have, but most don't.
This post is everything you need to get started with Prometheus and Grafana. No fluff, just the essentials.
What Are Prometheus and Grafana?
Prometheus = A time-series database that collects and stores metrics from your application.
Grafana = A visualization tool that turns those metrics into beautiful, real-time dashboards.
Think of it this way:
• Prometheus is like a data logger that continuously records: "At 2:43 PM, the API took 234ms to respond" • Grafana is the dashboard that shows you: "Response times spiked 400% at 3 PM every day this week"
Why not just use Datadog?
• Datadog costs $5k/month at scale • Prometheus + Grafana is free and open-source • You actually learn how monitoring works (valuable skill)
The 3 Core Metric Types You Need to Know
When instrumenting your application, you'll use three main metric types:
1. Counter (Always Goes Up)
Tracks things that only increase: total requests, errors, events processed.
typescript1import client from "prom-client"; 2 3export const requestCounter = new client.Counter({ 4 name: "http_requests_total", 5 help: "Total number of HTTP requests", 6 labelNames: ["method", "route", "status_code"], 7}); 8 9// Usage: Increment every time someone hits an endpoint 10requestCounter.inc({ 11 method: "GET", 12 route: "/api/user", 13 status_code: "200" 14});
Use Case: "How many total requests have we served?"
2. Gauge (Goes Up and Down)
Tracks current values: active users, queue size, CPU usage.
typescript1export const activeRequestGauge = new client.Gauge({ 2 name: "http_active_requests", 3 help: "Number of active HTTP requests", 4}); 5 6// Usage: Track concurrent requests 7activeRequestGauge.inc(); // Request started 8// ... handle request ... 9activeRequestGauge.dec(); // Request finished
Use Case: "How many users are online right now?"
3. Histogram (Tracks Distributions)
Tracks how values are distributed: response times, request sizes, query durations.
typescript1export const requestDurationHistogram = new client.Histogram({ 2 name: "http_request_duration_seconds", 3 help: "Duration of HTTP requests in seconds", 4 labelNames: ["method", "route", "status_code"], 5 buckets: [0.1, 0.5, 1, 2, 5, 10], // Response time buckets 6}); 7 8// Usage: Record how long each request takes 9const startTime = Date.now(); 10// ... handle request ... 11const duration = (Date.now() - startTime) / 1000; 12requestDurationHistogram.observe({ 13 method: "GET", 14 route: "/api/user", 15 status_code: "200" 16}, duration);
Use Case: "What's our p95 response time? How many requests take over 2 seconds?"
Building a Simple Monitored API
Here's a complete Express API with Prometheus metrics:
Step 1: Setup Metrics (`metrics.ts`)
typescript1import client from "prom-client"; 2 3export const register = new client.Registry(); 4 5// Collect default metrics (CPU, memory, etc.) 6client.collectDefaultMetrics({ register }); 7 8// Define your custom metrics 9export const requestCounter = new client.Counter({ 10 name: "http_requests_total", 11 help: "Total HTTP requests", 12 labelNames: ["method", "route", "status_code"], 13 registers: [register], 14}); 15 16export const activeRequestGauge = new client.Gauge({ 17 name: "http_active_requests", 18 help: "Active HTTP requests", 19 registers: [register], 20}); 21 22export const requestDurationHistogram = new client.Histogram({ 23 name: "http_request_duration_seconds", 24 help: "HTTP request duration", 25 labelNames: ["method", "route", "status_code"], 26 buckets: [0.1, 0.5, 1, 2, 5, 10], 27 registers: [register], 28});
Step 2: Create Middleware (`middleware.ts`)
typescript1import type { Request, Response, NextFunction } from "express"; 2import { 3 requestCounter, 4 activeRequestGauge, 5 requestDurationHistogram, 6} from "./metrics"; 7 8export function metricsMiddleware( 9 req: Request, 10 res: Response, 11 next: NextFunction 12): void { 13 activeRequestGauge.inc(); 14 const startTime = Date.now(); 15 const route = req.route?.path || req.path; 16 17 res.on("finish", () => { 18 const duration = (Date.now() - startTime) / 1000; 19 const statusCode = res.statusCode.toString(); 20 21 requestCounter.inc({ 22 method: req.method, 23 route, 24 status_code: statusCode, 25 }); 26 27 requestDurationHistogram.observe( 28 { method: req.method, route, status_code: statusCode }, 29 duration 30 ); 31 32 activeRequestGauge.dec(); 33 }); 34 35 next(); 36}
Step 3: Expose `/metrics` Endpoint
typescript1import express from "express"; 2import { metricsMiddleware } from "./middleware"; 3import { register } from "./metrics"; 4 5const app = express(); 6 7app.use(metricsMiddleware); 8 9// Your API routes 10app.get("/api/user", (req, res) => { 11 res.json({ message: "User data" }); 12}); 13 14// Metrics endpoint for Prometheus to scrape 15app.get("/metrics", async (req, res) => { 16 const metrics = await register.metrics(); 17 res.set("Content-Type", register.contentType); 18 res.end(metrics); 19}); 20 21app.listen(3000, () => { 22 console.log("Server running on http://localhost:3000"); 23 console.log("Metrics at http://localhost:3000/metrics"); 24});
Setting Up Prometheus & Grafana Locally
Here's how to get everything running on your machine:
Docker Compose Setup
Create docker-compose.yml:
yaml1version: '3.8' 2 3services: 4 prometheus: 5 image: prom/prometheus:latest 6 ports: 7 - "9090:9090" 8 volumes: 9 - ./prometheus.yml:/etc/prometheus/prometheus.yml 10 command: 11 - '--config.file=/etc/prometheus/prometheus.yml' 12 13 grafana: 14 image: grafana/grafana:latest 15 ports: 16 - "3001:3000" 17 environment: 18 - GF_SECURITY_ADMIN_PASSWORD=admin 19 depends_on: 20 - prometheus
Create prometheus.yml:
yaml1global: 2 scrape_interval: 15s 3 4scrape_configs: 5 - job_name: 'my-api' 6 static_configs: 7 - targets: ['host.docker.internal:3000']
Start Everything
bash1# Start your API 2bun run index.ts 3 4# Start Prometheus + Grafana 5docker-compose up -d 6 7# View metrics 8open http://localhost:3000/metrics # Raw metrics 9open http://localhost:9090 # Prometheus UI 10open http://localhost:3001 # Grafana (admin/admin)
Building Your First Dashboard in Grafana
-
Add Data Source: • Go to Configuration → Data Sources • Add Prometheus:
http://prometheus:9090 -
Create Dashboard with Key Panels:
Request Rate:
promql1rate(http_requests_total[5m])
Active Users:
promql1http_active_requests
95th Percentile Response Time:
promql1histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Error Rate:
promql1sum(rate(http_requests_total{status_code=~"5.."}[5m])) 2/ 3sum(rate(http_requests_total[5m])) * 100
Real-World Tips
Here's what I learned from implementing monitoring in production:
✅ Do This:
- Use low-cardinality labels (route, status code, method)
- Start with default metrics (CPU, memory)
- Keep metric names consistent:
http_requests_total,db_queries_total
❌ Don't Do This:
High cardinality kills performance:
typescript1// ❌ BAD - Creates millions of time series 2counter.inc({ user_id: req.user.id }); 3 4// ✅ GOOD - Low cardinality 5counter.inc({ user_plan: req.user.plan });
Why This Skill Matters
That $800 contract wasn't just about the money. It taught me that:
- Most companies need monitoring but don't know how to set it up
- Open-source skills (Prometheus/Grafana) are more valuable than vendor-specific ones (Datadog)
- You can charge well for infrastructure work if you actually know what you're doing
When your startup hits scale and is paying $10k/month to Datadog, guess who becomes valuable? The developer who can migrate to self-hosted Prometheus and save $100k/year.
Next Steps
Resources:
- Clone the code and run it locally
- Add custom metrics for your business logic (signups, payments, etc.)
- Build dashboards in Grafana
- Learn PromQL by experimenting with queries
Final Thought
Monitoring isn't sexy, but it's essential. Learn it once, use it forever. And maybe make some side money implementing it for startups who need it.
Now go instrument something and watch it in real-time. It's oddly satisfying. 🚀