How do you serve 15 million concurrent viewers when Virat Kohli walks to the crease and not drop a single frame?
This article is about the control plane behind that experience, not the video encoding pipeline itself. It is the invisible backend that decides who gets to watch, how many can watch, how sessions stay alive, and what happens when demand outruns safe capacity.
The Problem Nobody Talks About
Every IPL evening, around toss time, a streaming platform faces its most dangerous minute.
Within roughly 90 seconds, concurrent viewers can jump from a few million to well above ten million. CDN nodes heat up. session stores get hammered. license renewals spike. one bad config, one overloaded Redis shard, or one missing index and you become the platform that went down during the final.
That is the problem this architecture was built to solve.
Architecture Overview
At a high level, the system sits behind a CDN and an ALB, then exposes a set of application services for match state, playback, licensing, capacity, entitlement, degradation, metrics, and simulation.
text1INTERNET / CDN 2 | 3 v 4Application Load Balancer 5 | 6 v 7Match Service 8Playback Service 9License Service 10Capacity Service 11Entitlement Service 12Degrade Service 13Metrics Service 14Simulate Service 15 | 16 v 17Aurora PostgreSQL + ElastiCache Redis
The application is a Go monolith deployed on AWS ECS Fargate, with Aurora PostgreSQL for durable state and Redis for hot session data. Infrastructure is managed with Terraform.
Layer 1: The Domain
Before infrastructure, the system needs a clear domain model.
A match is not just scheduled, active, or completed. It also carries a capacity ladder that defines how the system should scale before and during the event.
A simplified Go model looks like this:
go1type Match struct { 2 ID string 3 Status MatchStatus 4 StartTime time.Time 5 Rungs []LadderRung 6} 7 8type LadderRung struct { 9 StartTime time.Time 10 TargetFleetSize int 11 ActiveSessionCeiling int 12 DBPoolTarget int 13 RedisPoolTarget int 14 DegradeThreshold int 15}
Each rung says: at this point in the match, expect this much demand, scale to this fleet size, allow this many active sessions, and enter degrade mode once a lower threshold is crossed.
- T-30 min: pre-match buildup, warm the fleet and raise the session ceiling.
- Toss: jump the floor aggressively before users flood in.
- First ball: scale further because this is where the real wave often arrives.
- Innings break: temporarily reduce the floor.
- Second innings / final overs: raise limits again for the next surge.
The key idea is simple: for live sports, reacting to CPU after the spike starts is already late. The platform should scale from the schedule, not from panic.
The Playback Session
When a user taps Watch Live, the backend creates a playback session:
go1type PlaybackSession struct { 2 ID string 3 UserID string 4 MatchID string 5 DeviceID string 6 ExpiresAt time.Time 7}
This session is stored in Redis with a five-minute TTL. The player must renew its license before the TTL expires. If it stops renewing, the session disappears and the capacity slot opens up for someone else.
That is not an accident. It is the self-healing mechanism that keeps dead sessions from consuming scarce concurrency forever.
Layer 2: The Admission Pipeline
A playback start request goes through five gates, in order:
text1Degradation Check 2Match Validation 3Entitlement Check 4Capacity Admission 5Session Creation
If any gate fails, the user is rejected cleanly before a session is created.
Gate 1: Degradation Check
The first question is whether the system is already in core-protect mode.
go1if s.degSvc.IsCoreProtectMode() { 2 // Allow core playback, shed non-essential features. 3}
This check is intentionally lock-free on the hot path. During load spikes, even tiny avoidable costs matter.
Gate 2: Match Active Validation
The backend verifies that the match is actually live before admitting a session. This avoids wasting slots on stale clients, scheduled matches that have not started, or invalid playback attempts.
Gate 3: Entitlement Check
The entitlement service decides whether the user is allowed to watch the match at all.
go1res, err := s.entSvc.CheckAccess(ctx, userID, matchID) 2if !res.HasAccess { 3 return nil, errors.New("entitlement denied") 4}
In production this usually means a billing or subscription dependency. In a prototype it can be mocked, but the interface boundary should still exist from day one.
Gate 4: Capacity Admission
Capacity admission is the critical section. It must be serialized so two requests cannot both see the same final slot as available.
go1func (s *Service) AdmitSession(ctx context.Context) error { 2 s.mu.Lock() 3 defer s.mu.Unlock() 4 5 if s.sessions >= s.activeRung.ActiveSessionCeiling { 6 return errors.New("capacity exhausted") 7 } 8 9 s.sessions++ 10 11 if s.sessions > s.activeRung.DegradeThreshold { 12 _ = s.degradeSvc.SetDegrade(ctx, domain.DegradeState{ 13 Enabled: true, 14 Reason: "auto-protect: capacity threshold crossed", 15 }) 16 } 17 18 return nil 19}
This does three jobs:
- enforce the hard ceiling
- increment active session state safely
- trigger graceful degradation before the absolute limit is reached
Gate 5: Session Creation
If the user clears all previous gates, the service creates the session in Redis.
go1session := &domain.PlaybackSession{ 2 ID: uuid.NewString(), 3 UserID: userID, 4 MatchID: matchID, 5 DeviceID: deviceID, 6 ExpiresAt: time.Now().Add(5 * time.Minute), 7} 8 9if err := s.redisRepo.SaveSession(ctx, session); err != nil { 10 s.capSvc.ReleaseSession(ctx) 11 return nil, err 12}
The compensating action matters. If Redis fails after capacity has already been reserved, the slot must be released immediately or the system will slowly leak capacity under failure.
Layer 3: The License Renewal Loop
A five-minute TTL only works if the player proves it is still alive.
Every few minutes, with jitter to avoid a thundering herd, the client calls:
text1POST /v1/license/renew 2{ "session_id": "abc-123" }
The renewal flow checks:
- the session still exists in Redis
- the user's plan still permits access
- the user has not exceeded their allowed concurrent devices
The device limit is enforced through device leases with a LastRenewed timestamp. Stale leases are cleaned up lazily during reads:
go1for id, lease := range devices { 2 if now.Sub(lease.LastRenewed) <= 2*time.Minute { 3 count++ 4 } else { 5 delete(devices, id) 6 } 7}
This avoids a separate cleanup daemon and still converges the system toward truth. If a user moves from phone to TV, the old device naturally expires after it stops renewing.
Layer 4: Graceful Degradation
The system protects core playback by shedding everything that is optional.
go1type Service struct { 2 mu sync.RWMutex 3 state domain.DegradeState 4 isCore atomic.Bool 5}
Reads are lock-free because degradation checks happen on nearly every request. Writes are rare and go through normal synchronization.
Core-protect mode can be triggered in three ways:
- Automatically when active sessions cross the degrade threshold.
- Manually through an admin API when operators see trouble.
- From infrastructure through CloudWatch alarms that detect stressed dependencies such as Redis CPU saturation.
When this mode is active, the platform should drop non-essential work such as overlays, recommendation payloads, thumbnails, and extra analytics while keeping playback and license renewals sacred.
Layer 5: Infrastructure Choices
The platform uses Aurora PostgreSQL, Redis Cluster, and ECS Fargate for very deliberate reasons.
Aurora PostgreSQL holds durable state like matches, user plans, and ladder configuration. Redis holds the hottest volatile state: playback sessions, device leases, and fast counters.
- Aurora PostgreSQL: read replicas, fast failover, auto-scaling storage, Multi-AZ durability.
- Redis Cluster: sharded hot keys, read replicas, automatic failover, low-latency session access.
- ECS Fargate: no node management, fast scale-out, and good economics for event-driven traffic patterns.
The fleet is pre-scaled using scheduled actions tied to the ladder instead of waiting for CPU-based reactive scaling.
hcl1resource "aws_appautoscaling_scheduled_action" "ladder_rung_1" { 2 name = "ladder-rung-1-toss" 3 schedule = "cron(0 14 * * ? *)" 4 5 scalable_target_action { 6 min_capacity = 20 7 max_capacity = 100 8 } 9}
The important idea is not the exact numbers. It is that the system raises the floor before the toss, not after containers are already overloaded.
Network Topology
The network layout is straightforward and strict:
text1Internet -> ALB:443 -> ECS Tasks:8080 -> Aurora:5432 / Redis:6379
The ALB lives in public subnets. Compute and data stores live in private subnets. Security groups enforce the chain so nothing talks directly to databases from the public internet.
Layer 6: The Spike Simulator
Any architecture that has not been load-tested under synthetic chaos is still mostly theory.
The simulator exposes an admin endpoint such as:
text1POST /v1/admin/simulate/spike 2{ 3 "match_id": "ipl-final-2024", 4 "total_users": 50000, 5 "concurrency": 500 6}
It fans out goroutines behind a buffered-channel semaphore to mimic many real users hitting the playback pipeline at once.
A representative implementation looks like this:
go1func (s *Service) SimulateSpike(ctx context.Context, params SpikeParams) { 2 go func() { 3 throttle := make(chan struct{}, params.Concurrency) 4 var wg sync.WaitGroup 5 6 for i := 0; i < params.TotalUsers; i++ { 7 throttle <- struct{}{} 8 wg.Add(1) 9 10 go func(idx int) { 11 defer wg.Done() 12 defer func() { <-throttle }() 13 14 userID := fmt.Sprintf("user-sim-%d", idx) 15 deviceID := uuid.NewString() 16 17 _, _ = s.playbackSvc.Start(context.Background(), userID, params.MatchID, deviceID) 18 time.Sleep(50 * time.Millisecond) 19 }(i) 20 } 21 22 wg.Wait() 23 }() 24}
This validates the admission pipeline, the degrade threshold, and the accuracy of concurrent metrics under pressure.
Layer 7: Metrics
The prototype tracks only a few metrics, but they are the ones that matter most during a live event:
- active sessions
- renewals succeeding
- renewals denied
Those three numbers tell operators whether the system is near the ceiling, whether heartbeat health is intact, and whether users are being turned away unexpectedly.
Why a Monolith
This control plane is intentionally a monolith.
For this workload, an in-process call is better than a network hop. Capacity tracking, degradation state, and session creation are tightly coordinated. Splitting them into separate services too early would trade simple correctness for distributed failure modes and eventual consistency at exactly the wrong layer.
The code can still be organized cleanly into packages and interfaces. It is a monolith in deployment topology, not in design discipline.
API Surface
POST /v1/matchesto create a match and its ladder rungsPOST /v1/matches/{id}/startto activate a matchGET /v1/matches/{id}/statusto fetch match state and capacity snapshotPOST /v1/playback/startto run the five-gate admission flowPOST /v1/playback/stopto release a sessionPOST /v1/license/renewto heartbeat and extend a sessionGET /v1/users/{id}/streamsto inspect active user streamsPOST /v1/admin/degradeto toggle core-protect modePOST /v1/admin/simulate/spiketo fire a synthetic surgeGET /metricsfor health and operator visibility
Graceful Shutdown and Containerization
Because the platform runs on Fargate, tasks need clean shutdown behavior during deploys and scale-downs.
The API traps SIGTERM, stops accepting new requests, and drains in-flight work for a short window so playback starts and license renewals do not get cut off mid-flight.
The container build is a simple multi-stage Dockerfile: compile in a Go builder image, copy only the static binary into a tiny runtime image, and keep the final image small so fresh tasks start quickly during scale-out.
What This Architecture Gets Right
- Predictable capacity through ladder-based pre-provisioning instead of reactive scrambling.
- Self-healing sessions through TTL-backed licenses and regular renewals.
- Graceful degradation that preserves core playback when dependencies run hot.
- Device concurrency enforcement through lease tracking instead of trust.
- Useful observability from a small set of high-signal metrics.
- Chaos readiness because simulation is built in, not bolted on later.
What Production Evolution Would Add
- CDN token auth so segment delivery is enforced at the edge.
- Distributed counters using Redis atomic primitives or Lua when capacity is shared across many instances.
- Real Redis-backed session commands instead of in-memory shortcuts.
- Kafka event streams for audit, analytics, and online feedback loops.
- Multi-region failover for global resilience.
Closing Thought
The best streaming infrastructure is the kind nobody notices.
If 15 million people can watch a cricket match without buffering, without admission bugs, without zombie sessions, and without a toss-time outage, that is not luck. It is what happens when capacity, failure handling, and operational realism are treated as product features, not afterthoughts.
