How I Got My First Internship Implementing Prometheus & Grafana (And Why You Should Learn It)

The Contract That Opened My Eyes

In my First Year of Engineering, I got my first monitoring contract. A startup founder reached out site was growing fast, and he had zero visibility into what was happening. No idea which pages were slow, how many users were active, or why their API crashed every Tuesday at 3 PM.

Six hours later, I had Prometheus and Grafana running, collecting metrics, and showing beautiful dashboards. The founder was ecstatic. I got paid $800. More importantly, I realized something: monitoring is a skill every backend developer should have, but most don't.

This post is everything you need to get started with Prometheus and Grafana. No fluff, just the essentials.

What Are Prometheus and Grafana?

Prometheus = A time-series database that collects and stores metrics from your application.

Grafana = A visualization tool that turns those metrics into beautiful, real-time dashboards.

Think of it this way:

• Prometheus is like a data logger that continuously records: "At 2:43 PM, the API took 234ms to respond" • Grafana is the dashboard that shows you: "Response times spiked 400% at 3 PM every day this week"

Why not just use Datadog?

• Datadog costs $5k/month at scale • Prometheus + Grafana is free and open-source • You actually learn how monitoring works (valuable skill)

The 3 Core Metric Types You Need to Know

When instrumenting your application, you'll use three main metric types:

1. Counter (Always Goes Up)

Tracks things that only increase: total requests, errors, events processed.

typescript
1import client from "prom-client";
2
3export const requestCounter = new client.Counter({
4  name: "http_requests_total",
5  help: "Total number of HTTP requests",
6  labelNames: ["method", "route", "status_code"],
7});
8
9// Usage: Increment every time someone hits an endpoint
10requestCounter.inc({ 
11  method: "GET", 
12  route: "/api/user", 
13  status_code: "200" 
14});

Use Case: "How many total requests have we served?"

2. Gauge (Goes Up and Down)

Tracks current values: active users, queue size, CPU usage.

typescript
1export const activeRequestGauge = new client.Gauge({
2  name: "http_active_requests",
3  help: "Number of active HTTP requests",
4});
5
6// Usage: Track concurrent requests
7activeRequestGauge.inc();  // Request started
8// ... handle request ...
9activeRequestGauge.dec();  // Request finished

Use Case: "How many users are online right now?"

3. Histogram (Tracks Distributions)

Tracks how values are distributed: response times, request sizes, query durations.

typescript
1export const requestDurationHistogram = new client.Histogram({
2  name: "http_request_duration_seconds",
3  help: "Duration of HTTP requests in seconds",
4  labelNames: ["method", "route", "status_code"],
5  buckets: [0.1, 0.5, 1, 2, 5, 10], // Response time buckets
6});
7
8// Usage: Record how long each request takes
9const startTime = Date.now();
10// ... handle request ...
11const duration = (Date.now() - startTime) / 1000;
12requestDurationHistogram.observe({ 
13  method: "GET", 
14  route: "/api/user", 
15  status_code: "200" 
16}, duration);

Use Case: "What's our p95 response time? How many requests take over 2 seconds?"

Building a Simple Monitored API

Here's a complete Express API with Prometheus metrics:

Step 1: Setup Metrics (`metrics.ts`)

typescript
1import client from "prom-client";
2
3export const register = new client.Registry();
4
5// Collect default metrics (CPU, memory, etc.)
6client.collectDefaultMetrics({ register });
7
8// Define your custom metrics
9export const requestCounter = new client.Counter({
10  name: "http_requests_total",
11  help: "Total HTTP requests",
12  labelNames: ["method", "route", "status_code"],
13  registers: [register],
14});
15
16export const activeRequestGauge = new client.Gauge({
17  name: "http_active_requests",
18  help: "Active HTTP requests",
19  registers: [register],
20});
21
22export const requestDurationHistogram = new client.Histogram({
23  name: "http_request_duration_seconds",
24  help: "HTTP request duration",
25  labelNames: ["method", "route", "status_code"],
26  buckets: [0.1, 0.5, 1, 2, 5, 10],
27  registers: [register],
28});

Step 2: Create Middleware (`middleware.ts`)

typescript
1import type { Request, Response, NextFunction } from "express";
2import {
3  requestCounter,
4  activeRequestGauge,
5  requestDurationHistogram,
6} from "./metrics";
7
8export function metricsMiddleware(
9  req: Request,
10  res: Response,
11  next: NextFunction
12): void {
13  activeRequestGauge.inc();
14  const startTime = Date.now();
15  const route = req.route?.path || req.path;
16
17  res.on("finish", () => {
18    const duration = (Date.now() - startTime) / 1000;
19    const statusCode = res.statusCode.toString();
20
21    requestCounter.inc({
22      method: req.method,
23      route,
24      status_code: statusCode,
25    });
26
27    requestDurationHistogram.observe(
28      { method: req.method, route, status_code: statusCode },
29      duration
30    );
31
32    activeRequestGauge.dec();
33  });
34
35  next();
36}

Step 3: Expose `/metrics` Endpoint

typescript
1import express from "express";
2import { metricsMiddleware } from "./middleware";
3import { register } from "./metrics";
4
5const app = express();
6
7app.use(metricsMiddleware);
8
9// Your API routes
10app.get("/api/user", (req, res) => {
11  res.json({ message: "User data" });
12});
13
14// Metrics endpoint for Prometheus to scrape
15app.get("/metrics", async (req, res) => {
16  const metrics = await register.metrics();
17  res.set("Content-Type", register.contentType);
18  res.end(metrics);
19});
20
21app.listen(3000, () => {
22  console.log("Server running on http://localhost:3000");
23  console.log("Metrics at http://localhost:3000/metrics");
24});

Setting Up Prometheus & Grafana Locally

Here's how to get everything running on your machine:

Docker Compose Setup

Create docker-compose.yml:

yaml
1version: '3.8'
2
3services:
4  prometheus:
5    image: prom/prometheus:latest
6    ports:
7      - "9090:9090"
8    volumes:
9      - ./prometheus.yml:/etc/prometheus/prometheus.yml
10    command:
11      - '--config.file=/etc/prometheus/prometheus.yml'
12
13  grafana:
14    image: grafana/grafana:latest
15    ports:
16      - "3001:3000"
17    environment:
18      - GF_SECURITY_ADMIN_PASSWORD=admin
19    depends_on:
20      - prometheus

Create prometheus.yml:

yaml
1global:
2  scrape_interval: 15s
3
4scrape_configs:
5  - job_name: 'my-api'
6    static_configs:
7      - targets: ['host.docker.internal:3000']

Start Everything

bash
1# Start your API
2bun run index.ts
3
4# Start Prometheus + Grafana
5docker-compose up -d
6
7# View metrics
8open http://localhost:3000/metrics  # Raw metrics
9open http://localhost:9090          # Prometheus UI
10open http://localhost:3001          # Grafana (admin/admin)

Building Your First Dashboard in Grafana

Add Data Source: • Go to Configuration → Data Sources • Add Prometheus: http://prometheus:9090
Create Dashboard with Key Panels:

Request Rate:

promql
1rate(http_requests_total[5m])

Active Users:

promql
1http_active_requests

95th Percentile Response Time:

promql
1histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Error Rate:

promql
1sum(rate(http_requests_total{status_code=~"5.."}[5m])) 
2/ 
3sum(rate(http_requests_total[5m])) * 100

Real-World Tips

Here's what I learned from implementing monitoring in production:

✅ Do This:

Use low-cardinality labels (route, status code, method)
Start with default metrics (CPU, memory)
Keep metric names consistent: http_requests_total, db_queries_total

❌ Don't Do This:

High cardinality kills performance:

typescript
1// ❌ BAD - Creates millions of time series
2counter.inc({ user_id: req.user.id });
3
4// ✅ GOOD - Low cardinality
5counter.inc({ user_plan: req.user.plan });

Why This Skill Matters

That $800 contract wasn't just about the money. It taught me that:

Most companies need monitoring but don't know how to set it up
Open-source skills (Prometheus/Grafana) are more valuable than vendor-specific ones (Datadog)
You can charge well for infrastructure work if you actually know what you're doing

When your startup hits scale and is paying $10k/month to Datadog, guess who becomes valuable? The developer who can migrate to self-hosted Prometheus and save $100k/year.

Next Steps

Resources:

• Prometheus Docs • Grafana Tutorials • PromQL Basics

Clone the code and run it locally
Add custom metrics for your business logic (signups, payments, etc.)
Build dashboards in Grafana
Learn PromQL by experimenting with queries

Final Thought

Monitoring isn't sexy, but it's essential. Learn it once, use it forever. And maybe make some side money implementing it for startups who need it.

Now go instrument something and watch it in real-time. It's oddly satisfying. 🚀