Why do bank transfer systems use idempotency keys?

Idempotency keys prevent duplicate transfers when the same payment request is sent more than once due to retries, lag, or repeated user taps. The backend stores the key and returns the previous result instead of processing a second transfer.

Why reserve money before settling a transfer?

Reservation protects the system from losing money during partial failures. The amount is locked first, and only converted into a final settled transfer after the receiving side confirms success. If the transfer fails, the reservation can be released safely.

What does reconciliation mean in a payment system?

Reconciliation is the process of checking pending transactions against external payment rails or banks to determine their final status. Background workers use it to resolve transfers that were left in pending or uncertain states due to timeouts or delayed responses.

How Bank Transfers Work

Ever wondered what actually happens when you hit Pay on your banking app and your friend receives the money almost instantly? To the user, it feels like magic. But as engineers, we know there is no magic, only careful state management, reliable infrastructure, and good system design.

If you're a fresher or junior engineer trying to understand how real-world fintech systems are built, this is one of the best problems to study. Let's look under the hood of a modern bank payment pipeline.

The Problem: It's Not Just `balance = balance - amount`

When you first learn to code, a bank transfer looks deceptively simple:

UPDATE accounts SET balance = balance - 500 WHERE id = sender_id;
UPDATE accounts SET balance = balance + 500 WHERE id = receiver_id;

But what happens if the database crashes exactly between step 1 and step 2? The sender loses money, the receiver gets nothing, and your system has just created a financial disaster.

In the real world, distributed systems fail all the time. Networks drop, APIs time out, and databases get locked. So banks do not model transfers as two casual updates. They build an event-driven, state-machine-backed pipeline that treats money movement as a controlled process, not a single query.

1. The API Gateway

Every transfer starts at the front door: the API Gateway, usually something like Nginx, Kong, or a managed gateway sitting in front of the core backend services.

Its first major job is idempotency.

Imagine a user tapping the Pay button three times because the network feels slow. We cannot process the transfer three times. To prevent that, the mobile app generates a unique idempotency-key and sends it in the request headers. The API layer stores or checks that key in a fast datastore like Redis.

If the same key shows up again within a safe window, the gateway returns the original result instead of starting a new transfer.

2. The Core Ledger

Strong financial systems do not just mutate a balance column. They use double-entry accounting, where every transaction has a debit side and a credit side.

They also avoid moving money immediately. Instead, they often use a reservation pattern:

The system reserves the amount first
The available balance drops
The money is not fully settled yet
Final settlement happens only after the receiving side confirms success

This ledger usually lives in a highly consistent relational database like PostgreSQL or CockroachDB. If the downstream transfer fails, the reservation is released and the money becomes available again.

3. The Smart Router

Banks do not send every payment through the same rail. The payment core works like a strategy engine and selects the right transfer path based on the payload.

UPI / IMPS for fast, low-to-medium value transfers
NEFT for scheduled or batch-friendly transfers
RTGS for high-value transfers with stricter rules

The router evaluates factors like amount, transfer type, timing, and regulatory rules, then picks the correct rail adapter. That adapter knows how to speak to the external payment network.

4. Handling Network Chaos

When the system calls an external payment rail, it usually expects one of three states: SETTLED, FAILED, or PENDING.

PENDING is the interesting one.

If an external bank or payment network is slow, we cannot keep the user's HTTP request open forever. That would waste resources and eventually crush the service under high load. So the backend returns 202 Accepted, stores the transfer as PENDING, and lets asynchronous systems take over.

5. Background Workers and Reconciliation

This is where background workers earn their salary.

Workers continuously scan or consume pending transfers and ask the external system for the final result: did this transfer settle or not? This process is called reconciliation.

Once the worker gets a definitive outcome, it updates the ledger:

SETTLED if the transfer succeeded
FAILED if the transfer was rejected
reservation released if the money needs to be returned

These workers are often driven by queues like RabbitMQ, Kafka, SQS, or even scheduled jobs depending on the system's scale and reliability needs.

6. Real-Time Notifications

After reconciliation marks the transfer as settled, the user still needs to know.

Polling the API every two seconds would be terrible. It would drain mobile batteries and create unnecessary load on the backend. So modern systems use an event-driven notification layer.

The payment core or worker publishes a TransferSettled event to Kafka or SQS
A separate Notification Service consumes that event
That service pushes a real-time update through WebSockets, FCM, or APNs
The user's screen flips to Payment Successful almost instantly

The Big Picture: The Journey of ₹500

Initiate: Mobile app sends POST /transfer with an idempotency-key.
Check: Redis verifies the request is not a duplicate.
Reserve: PostgreSQL creates ledger entries and moves ₹500 into a reserved state.
Route: The payment core selects the correct rail, maybe UPI for a small transfer.
Execute: The external rail responds with PENDING because the network is slow.
Return Early: The HTTP request ends and the user sees Processing....
Reconcile: A worker later checks the external network and receives SETTLED.
Commit: The ledger is finalized and the reservation becomes a settled transfer.
Publish: A success event is dropped into Kafka or another broker.
Notify: The notification service pushes the result to the user's device.

Why This Design Works

It is fault-tolerant. Failures are expected, so the system has explicit pending and recovery paths.
It is scalable. The API layer, ledger, workers, and notification systems can all scale independently.
It protects money. Reservations and reconciliation reduce the risk of money disappearing mid-transfer.
It is understandable. The transfer moves through clear states instead of hiding complexity in a single function.

Conclusion

Building a payment system is not about clever algorithms. It is about managing distributed state safely.

Once you separate the API layer, the ledger, the async reconciliation workers, and the notification system, the design becomes far more reliable. Each part has one job. Together, they ensure money is either moved correctly or not moved at all.

Next time you tap Pay and see money move in seconds, remember: behind that smooth experience is a carefully engineered pipeline working very hard to make the transfer feel simple.