Why Your PostgreSQL Read Replicas Serve Stale Bonus Balances
The friend who texted you at 2 AM with a “small issue” — the one whose real-money poker app showed a user a $2,500 bonus balance that let them play 47 hands before vanishing — is not having a bad night. He is having a catastrophic one. That phantom balance sat in a PostgreSQL read replica that was 14 seconds behind the primary, and those 14 seconds just cost him his user’s trust, a potential fraud investigation by his payment processor, and roughly three hours of sleep explaining to a lawyer why “eventual consistency” isn’t a defense.
If you run any kind of platform where real balances — bonuses, chips, credits, or cash — get served to users, you have already felt this pain. You just might not have traced it back to your read replicas yet. Let’s fix that.
The Physics of Stale Reads
PostgreSQL read replicas are not magic mirrors. They are asynchronous followers that stream write-ahead logs (WAL) from the primary and replay them locally. Every replica lives in a state of deliberate, measured lag. The database architecture community calls this “replication lag,” but in production, it feels more like a time machine that sometimes works.
The core mechanism is simple: the primary commits a transaction. That transaction gets written to the WAL. The WAL segment gets shipped to the replica. The replica replays it. Between the commit on the primary and the replay on the replica, any query hitting the replica sees the database as it was before that transaction happened.
This is not a bug. It is a feature designed for read scaling. But it becomes a landmine the moment you treat a replica as authoritative for anything that looks like a balance.
Typical replication lag in a well-tuned PostgreSQL setup ranges from a few milliseconds to several seconds. Under write-heavy loads — exactly the kind of load a live gaming platform generates during a tournament or bonus drop — that lag can spike to 30 seconds or more. I have seen a production replica lag 47 seconds during a midnight leaderboard flush. The primary was fine. The replica was a museum of old data.
Why Bonus Balances Are the Canary
Bonuses in gaming platforms have a specific property that makes them uniquely vulnerable to stale reads: they are ephemeral, high-velocity state changes that users interact with immediately. A user sees a bonus, plays with it, and expects the balance to reflect their actions in real time.
Consider the lifecycle of a typical welcome bonus:
- User deposits $100
- System grants a $50 bonus
- User sees total balance of $150
- User plays 10 hands at $5 each
- System deducts $50 from the cash balance
- Bonus balance should now be $50
If step 2 writes to the primary and step 3 reads from a replica that hasn’t replayed that write yet, the user sees a $100 balance. They play five hands. The replica catches up. Now the balance shows $150 again — but the user has already lost $25 of real money they shouldn’t have been able to spend. The bonus balance they thought they had is gone, and the cash balance is lower than it should be.
This is not a theoretical edge case. This is a Tuesday.
The Anecdote That Keeps Me Up at Night
A studio I consulted for in 2022 ran a “Double Deposit Bonus” promotion on their casino platform. Their architecture was standard: a primary PostgreSQL instance for writes, three read replicas behind a load balancer for the web frontend. The promotion was a hit. Traffic tripled. Replication lag hit 22 seconds.
One user deposited $500. The primary wrote the deposit and the $500 bonus. The replica serving their next page load had not replayed either. The user saw $500. They played $200 in slots. The replica caught up. The user’s balance jumped to $1,000 — the original $500 plus the $500 bonus — but they had already lost $200 of real money they thought was the total balance.
The user filed a chargeback. The payment processor froze the account. The studio spent three weeks reconciling logs and ended up eating the $200 loss to keep the user from posting on Reddit. The root cause was never a code bug. It was a read path that treated a replica as a source of truth for a balance that existed only on the primary.
The Architecture of Truth
The fundamental problem is that most web applications have a single abstraction for reading data: “get the current balance.” That abstraction hides whether the data comes from the primary or a replica. When you have a read-write split, you need a second abstraction: “get the current balance, but only if you are certain it is the most recent version.”
PostgreSQL gives you a tool for this, but it is awkward. The pg_stat_replication view exposes the replay_lag column for each replica. You can query it and compare it to a threshold. If the lag exceeds your tolerance, you fall back to the primary. This works, but it introduces a network round trip and a query against a system view on every read that needs consistency.
A better pattern is application-level tagging of requests that require fresh data. Any request that touches a balance — viewing it, spending it, transferring it — should be routed to the primary. Period. The replica should serve only read-only data that is intrinsically stale-tolerant: leaderboards, game history, user profiles that haven’t changed in hours, static content.
Routing by Request Type
The cleanest implementation I have seen uses middleware that inspects the request path or a custom header. In a Node.js Express application, you might have something like:
const dbRouter = (req, res, next) => {
if (req.path.startsWith('/api/balance') || req.path.startsWith('/api/wallet')) {
req.db = primaryPool;
} else {
req.db = replicaPool;
}
next();
};
This is simple. It is also fragile. A developer adds a new endpoint under /api/user/profile that returns the balance, and suddenly stale data leaks through. The better approach is to make the routing explicit at the query level, not the endpoint level.
Session-Level Consistency Hints
PostgreSQL supports SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY and READ WRITE at the session level. You can bind a connection to the primary for the duration of a request that performs any write or any read that must see those writes. This is how Rails’ ActiveRecord::Base.connected_to works under the hood, and it is a proven pattern.
In a Node.js environment using pg directly, you can wrap your query functions:
const withFreshData = async (callback) => {
const client = await primaryPool.connect();
try {
return await callback(client);
} finally {
client.release();
}
};
// Usage
const balance = await withFreshData(async (client) => {
const result = await client.query('SELECT balance FROM wallets WHERE user_id = $1', [userId]);
return result.rows[0].balance;
});
This forces the developer to explicitly opt into the primary for any read that needs fresh data. It is verbose. It is also the only pattern I have seen survive multiple developer turnovers without a stale-read incident.
The Logical Replication Gotcha
Streaming replication is not the only way to replicate data in PostgreSQL. Logical replication, introduced in PostgreSQL 10, publishes changes as a stream of logical decoding events. It is more flexible — you can replicate only specific tables, or even specific rows — but it introduces a different class of staleness.
Logical replication does not replicate DDL changes by default. If you add a column to a table on the primary, the replica will not see it until you manually apply the schema change. This means your application code might query a column that does not exist on the replica, or worse, query a column that exists but contains NULL because the replica’s schema is ahead of its data.
I have debugged a production incident where a CHECK constraint on a bonus expiry column existed on the primary but not on the replica. The primary rejected a bonus grant because the expiry date was in the past. The replica accepted it because it had no constraint. The user saw a bonus on the replica that never existed on the primary. The lag was 3 seconds. The confusion lasted weeks.
Monitoring That Actually Helps
Standard PostgreSQL monitoring tools like pg_stat_replication show you the current lag, but they do not tell you which specific transactions are still pending on the replica. For bonus balances, you care about the last transaction that touched a specific user’s wallet, not the global lag.
You can compute this with a heartbeat table. Insert a row into a heartbeats table on the primary every second with the current timestamp and a unique ID. On the replica, query the most recent heartbeat. The difference between the primary’s write time and the replica’s read time is the effective lag for your application. This gives you a per-request staleness measurement that accounts for the actual data path, not just the WAL replay position.
Set an alert when this application-level lag exceeds 1 second during business hours. Set a critical alert when it exceeds 5 seconds. Do not wait for a user to tell you their balance is wrong.
The Caching Layer Trap
Many developers try to solve the stale replica problem by adding Redis or Memcached in front of the database. The thinking is: write to the primary, invalidate the cache, and subsequent reads from the replica will hit the cache instead of the stale database.
This works until it doesn’t. Cache invalidation is itself an asynchronous operation. If the cache invalidation message arrives at the cache before the replica has replayed the write, the cache will be empty, and the read will fall through to the replica — which is still stale. You have just added a layer of complexity without solving the root problem.
A more reliable pattern is to write-through to the cache from the primary. After the primary commits the transaction, write the new balance directly to Redis with a short TTL, like 5 seconds. Reads from the replica check Redis first. If Redis has the value, use it. If not, read from the replica and accept the staleness. This gives you a 5-second window of consistency for the most critical data, and it degrades gracefully when Redis is down.
The Write-Through Example
def grant_bonus(user_id, amount):
with primary_conn.cursor() as cur:
cur.execute("UPDATE wallets SET bonus_balance = bonus_balance + %s WHERE user_id = %s", (amount, user_id))
cur.execute("SELECT bonus_balance FROM wallets WHERE user_id = %s", (user_id,))
new_balance = cur.fetchone()[0]
redis.setex(f"balance:{user_id}", 5, new_balance)
The read path:
def get_balance(user_id):
cached = redis.get(f"balance:{user_id}")
if cached is not None:
return int(cached)
with replica_conn.cursor() as cur:
cur.execute("SELECT bonus_balance FROM wallets WHERE user_id = %s", (user_id,))
return cur.fetchone()[0]
This is not perfect. If the primary crashes after the database write but before the Redis write, you have a stale cache for up to 5 seconds. But it is dramatically better than serving a 22-second-old balance from a replica.
The Real Solution: Primary Reads for Balance Operations
After years of watching studios burn cycles on this problem, I have come to a simple conclusion: do not read bonus balances from replicas. Ever. Not with lag monitoring. Not with caching. Not with fancy routing. Just do not do it.
The performance argument for reading balances from replicas is weak. Balance queries are single-row lookups by primary key. They are fast. They are not the queries that are saturating your primary. The queries that saturate your primary are the aggregation queries — leaderboard calculations, game history summaries, analytics rollups. Those are the queries that belong on replicas.
A single SELECT bonus_balance FROM wallets WHERE user_id = $1 on a primary with proper indexing takes under a millisecond. You are not going to save enough query throughput to justify the risk of serving a stale balance. The cost of one stale-balance incident — in chargebacks, support time, and user churn — exceeds the cost of running your primary at 5% higher CPU utilization for a year.
Forward-Looking Architecture
The next generation of PostgreSQL tooling is making this easier. Patroni and other high-availability managers can expose a health endpoint that tells your application whether a given node is the current primary. Connection poolers like PgBouncer can route transactions to specific nodes based on session parameters. But these are tools, not solutions.
The real solution is a cultural shift in how your team thinks about data freshness. Every developer on your team should be able to answer the question: “Is this read tolerant of staleness?” If the answer is no, the query goes to the primary. No exceptions. No clever workarounds. No “it’s probably fine because the lag is usually under 100 milliseconds.”
The user who saw that $2,500 bonus balance at 2 AM did not care about your replication lag metrics. They cared about what the screen showed them. Make sure the screen shows them the truth, even if it means reading from the primary for a query that takes 0.3 milliseconds instead of 0.2.
Your replicas are for scaling reads that can afford to be slightly behind. Your users’ money is not one of those reads.