Behind the Scenes: Building a Live Cricket Streaming Platform That Doesn't Crash at Toss Time

How do you serve 15 million concurrent viewers when Virat Kohli walks to the crease and not drop a single frame?

This article is about the control plane behind that experience, not the video encoding pipeline itself. It is the invisible backend that decides who gets to watch, how many can watch, how sessions stay alive, and what happens when demand outruns safe capacity.

The Problem Nobody Talks About

Every IPL evening, around toss time, a streaming platform faces its most dangerous minute.

Within roughly 90 seconds, concurrent viewers can jump from a few million to well above ten million. CDN nodes heat up. session stores get hammered. license renewals spike. one bad config, one overloaded Redis shard, or one missing index and you become the platform that went down during the final.

That is the problem this architecture was built to solve.

Architecture Overview

At a high level, the system sits behind a CDN and an ALB, then exposes a set of application services for match state, playback, licensing, capacity, entitlement, degradation, metrics, and simulation.


INTERNET / CDN
      |
      v
Application Load Balancer
      |
      v
Match Service
Playback Service
License Service
Capacity Service
Entitlement Service
Degrade Service
Metrics Service
Simulate Service
      |
      v
Aurora PostgreSQL + ElastiCache Redis

The application is a Go monolith deployed on AWS ECS Fargate, with Aurora PostgreSQL for durable state and Redis for hot session data. Infrastructure is managed with Terraform.

Layer 1: The Domain

Before infrastructure, the system needs a clear domain model.

A match is not just scheduled, active, or completed. It also carries a capacity ladder that defines how the system should scale before and during the event.

A simplified Go model looks like this:


go
1type Match struct {
2    ID        string
3    Status    MatchStatus
4    StartTime time.Time
5    Rungs     []LadderRung
6}
7
8type LadderRung struct {
9    StartTime            time.Time
10    TargetFleetSize      int
11    ActiveSessionCeiling int
12    DBPoolTarget         int
13    RedisPoolTarget      int
14    DegradeThreshold     int
15}

Each rung says: at this point in the match, expect this much demand, scale to this fleet size, allow this many active sessions, and enter degrade mode once a lower threshold is crossed.

T-30 min: pre-match buildup, warm the fleet and raise the session ceiling.
Toss: jump the floor aggressively before users flood in.
First ball: scale further because this is where the real wave often arrives.
Innings break: temporarily reduce the floor.
Second innings / final overs: raise limits again for the next surge.

The key idea is simple: for live sports, reacting to CPU after the spike starts is already late. The platform should scale from the schedule, not from panic.

The Playback Session

When a user taps Watch Live, the backend creates a playback session:


go
1type PlaybackSession struct {
2    ID        string
3    UserID    string
4    MatchID   string
5    DeviceID  string
6    ExpiresAt time.Time
7}

This session is stored in Redis with a five-minute TTL. The player must renew its license before the TTL expires. If it stops renewing, the session disappears and the capacity slot opens up for someone else.

That is not an accident. It is the self-healing mechanism that keeps dead sessions from consuming scarce concurrency forever.

Layer 2: The Admission Pipeline

A playback start request goes through five gates, in order:


Degradation Check
Match Validation
Entitlement Check
Capacity Admission
Session Creation

If any gate fails, the user is rejected cleanly before a session is created.

Gate 1: Degradation Check

The first question is whether the system is already in core-protect mode.


go
1if s.degSvc.IsCoreProtectMode() {
2    // Allow core playback, shed non-essential features.
3}

This check is intentionally lock-free on the hot path. During load spikes, even tiny avoidable costs matter.

Gate 2: Match Active Validation

The backend verifies that the match is actually live before admitting a session. This avoids wasting slots on stale clients, scheduled matches that have not started, or invalid playback attempts.

Gate 3: Entitlement Check

The entitlement service decides whether the user is allowed to watch the match at all.


go
1res, err := s.entSvc.CheckAccess(ctx, userID, matchID)
2if !res.HasAccess {
3    return nil, errors.New("entitlement denied")
4}

In production this usually means a billing or subscription dependency. In a prototype it can be mocked, but the interface boundary should still exist from day one.

Gate 4: Capacity Admission

Capacity admission is the critical section. It must be serialized so two requests cannot both see the same final slot as available.


go
1func (s *Service) AdmitSession(ctx context.Context) error {
2    s.mu.Lock()
3    defer s.mu.Unlock()
4
5    if s.sessions >= s.activeRung.ActiveSessionCeiling {
6        return errors.New("capacity exhausted")
7    }
8
9    s.sessions++
10
11    if s.sessions > s.activeRung.DegradeThreshold {
12        _ = s.degradeSvc.SetDegrade(ctx, domain.DegradeState{
13            Enabled: true,
14            Reason:  "auto-protect: capacity threshold crossed",
15        })
16    }
17
18    return nil
19}

This does three jobs:

enforce the hard ceiling
increment active session state safely
trigger graceful degradation before the absolute limit is reached

Gate 5: Session Creation

If the user clears all previous gates, the service creates the session in Redis.


go
1session := &domain.PlaybackSession{
2    ID:        uuid.NewString(),
3    UserID:    userID,
4    MatchID:   matchID,
5    DeviceID:  deviceID,
6    ExpiresAt: time.Now().Add(5 * time.Minute),
7}
8
9if err := s.redisRepo.SaveSession(ctx, session); err != nil {
10    s.capSvc.ReleaseSession(ctx)
11    return nil, err
12}

The compensating action matters. If Redis fails after capacity has already been reserved, the slot must be released immediately or the system will slowly leak capacity under failure.

Layer 3: The License Renewal Loop

A five-minute TTL only works if the player proves it is still alive.

Every few minutes, with jitter to avoid a thundering herd, the client calls:


POST /v1/license/renew
{ "session_id": "abc-123" }

The renewal flow checks:

the session still exists in Redis
the user's plan still permits access
the user has not exceeded their allowed concurrent devices

The device limit is enforced through device leases with a LastRenewed timestamp. Stale leases are cleaned up lazily during reads:


go
1for id, lease := range devices {
2    if now.Sub(lease.LastRenewed) <= 2*time.Minute {
3        count++
4    } else {
5        delete(devices, id)
6    }
7}

This avoids a separate cleanup daemon and still converges the system toward truth. If a user moves from phone to TV, the old device naturally expires after it stops renewing.

Layer 4: Graceful Degradation

The system protects core playback by shedding everything that is optional.


go
1type Service struct {
2    mu     sync.RWMutex
3    state  domain.DegradeState
4    isCore atomic.Bool
5}

Reads are lock-free because degradation checks happen on nearly every request. Writes are rare and go through normal synchronization.

Core-protect mode can be triggered in three ways:

Automatically when active sessions cross the degrade threshold.
Manually through an admin API when operators see trouble.
From infrastructure through CloudWatch alarms that detect stressed dependencies such as Redis CPU saturation.

When this mode is active, the platform should drop non-essential work such as overlays, recommendation payloads, thumbnails, and extra analytics while keeping playback and license renewals sacred.

Layer 5: Infrastructure Choices

The platform uses Aurora PostgreSQL, Redis Cluster, and ECS Fargate for very deliberate reasons.

Aurora PostgreSQL holds durable state like matches, user plans, and ladder configuration. Redis holds the hottest volatile state: playback sessions, device leases, and fast counters.

Aurora PostgreSQL: read replicas, fast failover, auto-scaling storage, Multi-AZ durability.
Redis Cluster: sharded hot keys, read replicas, automatic failover, low-latency session access.
ECS Fargate: no node management, fast scale-out, and good economics for event-driven traffic patterns.

The fleet is pre-scaled using scheduled actions tied to the ladder instead of waiting for CPU-based reactive scaling.


hcl
1resource "aws_appautoscaling_scheduled_action" "ladder_rung_1" {
2  name     = "ladder-rung-1-toss"
3  schedule = "cron(0 14 * * ? *)"
4
5  scalable_target_action {
6    min_capacity = 20
7    max_capacity = 100
8  }
9}

The important idea is not the exact numbers. It is that the system raises the floor before the toss, not after containers are already overloaded.

Network Topology

The network layout is straightforward and strict:


Internet -> ALB:443 -> ECS Tasks:8080 -> Aurora:5432 / Redis:6379

The ALB lives in public subnets. Compute and data stores live in private subnets. Security groups enforce the chain so nothing talks directly to databases from the public internet.

Layer 6: The Spike Simulator

Any architecture that has not been load-tested under synthetic chaos is still mostly theory.

The simulator exposes an admin endpoint such as:


POST /v1/admin/simulate/spike
{
  "match_id": "ipl-final-2024",
  "total_users": 50000,
  "concurrency": 500
}

It fans out goroutines behind a buffered-channel semaphore to mimic many real users hitting the playback pipeline at once.

A representative implementation looks like this:


go
1func (s *Service) SimulateSpike(ctx context.Context, params SpikeParams) {
2    go func() {
3        throttle := make(chan struct{}, params.Concurrency)
4        var wg sync.WaitGroup
5
6        for i := 0; i < params.TotalUsers; i++ {
7            throttle <- struct{}{}
8            wg.Add(1)
9
10            go func(idx int) {
11                defer wg.Done()
12                defer func() { <-throttle }()
13
14                userID := fmt.Sprintf("user-sim-%d", idx)
15                deviceID := uuid.NewString()
16
17                _, _ = s.playbackSvc.Start(context.Background(), userID, params.MatchID, deviceID)
18                time.Sleep(50 * time.Millisecond)
19            }(i)
20        }
21
22        wg.Wait()
23    }()
24}

This validates the admission pipeline, the degrade threshold, and the accuracy of concurrent metrics under pressure.

Layer 7: Metrics

The prototype tracks only a few metrics, but they are the ones that matter most during a live event:

active sessions
renewals succeeding
renewals denied

Those three numbers tell operators whether the system is near the ceiling, whether heartbeat health is intact, and whether users are being turned away unexpectedly.

Why a Monolith

This control plane is intentionally a monolith.

For this workload, an in-process call is better than a network hop. Capacity tracking, degradation state, and session creation are tightly coordinated. Splitting them into separate services too early would trade simple correctness for distributed failure modes and eventual consistency at exactly the wrong layer.

The code can still be organized cleanly into packages and interfaces. It is a monolith in deployment topology, not in design discipline.

API Surface

POST /v1/matches to create a match and its ladder rungs
POST /v1/matches/{id}/start to activate a match
GET /v1/matches/{id}/status to fetch match state and capacity snapshot
POST /v1/playback/start to run the five-gate admission flow
POST /v1/playback/stop to release a session
POST /v1/license/renew to heartbeat and extend a session
GET /v1/users/{id}/streams to inspect active user streams
POST /v1/admin/degrade to toggle core-protect mode
POST /v1/admin/simulate/spike to fire a synthetic surge
GET /metrics for health and operator visibility

Graceful Shutdown and Containerization

Because the platform runs on Fargate, tasks need clean shutdown behavior during deploys and scale-downs.

The API traps SIGTERM, stops accepting new requests, and drains in-flight work for a short window so playback starts and license renewals do not get cut off mid-flight.

The container build is a simple multi-stage Dockerfile: compile in a Go builder image, copy only the static binary into a tiny runtime image, and keep the final image small so fresh tasks start quickly during scale-out.

What This Architecture Gets Right

Predictable capacity through ladder-based pre-provisioning instead of reactive scrambling.
Self-healing sessions through TTL-backed licenses and regular renewals.
Graceful degradation that preserves core playback when dependencies run hot.
Device concurrency enforcement through lease tracking instead of trust.
Useful observability from a small set of high-signal metrics.
Chaos readiness because simulation is built in, not bolted on later.

What Production Evolution Would Add

CDN token auth so segment delivery is enforced at the edge.
Distributed counters using Redis atomic primitives or Lua when capacity is shared across many instances.
Real Redis-backed session commands instead of in-memory shortcuts.
Kafka event streams for audit, analytics, and online feedback loops.
Multi-region failover for global resilience.

Closing Thought

The best streaming infrastructure is the kind nobody notices.

If 15 million people can watch a cricket match without buffering, without admission bugs, without zombie sessions, and without a toss-time outage, that is not luck. It is what happens when capacity, failure handling, and operational realism are treated as product features, not afterthoughts.