~/webline_global $

// Everyday tech, explained simply.

Why Your PostgreSQL Connections Stall Under 300 Concurrent Live Dealer Games

· 9 min read
Why Your PostgreSQL Connections Stall Under 300 Concurrent Live Dealer Games

It is not a hardware shortage or a slow cloud provider that brings down a live dealer platform once it crosses 300 concurrent tables, but a database architecture problem hiding in plain sight: PostgreSQL’s connection overhead and its inability to scale OLTP workloads under high session counts without aggressive tuning. For every live dealer table, a typical stack opens between three and seven persistent database connections to push game state, hand history, and bet resolution data, meaning 300 concurrent tables can generate more than 1,800 simultaneous connections—a number that, on an unmodified PostgreSQL deployment, triggers connection queueing, lock contention, and eventual transaction timeouts that stall the entire platform. This article maps the exact failure point, explains why connection pooling and query patterns often make things worse, and poses a question the industry has not yet answered: does the live dealer vertical need its own database engine?


The 1,800-Connection Ceiling

PostgreSQL’s default configuration allows 100 simultaneous connections. Raising max_connections to 2,000 is trivial in a config file, but the database does not handle that many open sessions gracefully. Each connection consumes roughly 5–10 MB of shared memory for its working context, plus additional per-backend memory for sorting, hash joins, and temporary tables. At 2,000 connections, the memory footprint alone exceeds 16 GB before any query executes.

The real bottleneck, however, is not memory but context switching and lock management. PostgreSQL uses a single process per connection—not a thread pool—so the operating system must schedule 2,000 processes competing for CPU time. The Postmaster process, which coordinates worker processes, also holds a global lock on the shared buffer pool during checkpoint and WAL flush operations. As connection count rises, the frequency and duration of these lock acquisitions increase nonlinearly. Benchmarks from the PostgreSQL Performance Farm show that query throughput begins to degrade past 300 simultaneous connections, and transaction latency triples at 500 connections. At 1,800 connections, the database spends more time managing connections than executing queries.

Live dealer games compound this because each table’s state is updated continuously. A single blackjack hand at 30-second intervals with six players produces roughly 12 database writes per hand plus 18 reads for bet placement and card data. Multiply that by 300 tables and you get approximately 3,600 writes and 5,400 reads per minute—roughly 150 transactions per second. That volume is modest for a tuned PostgreSQL instance, but when 1,800 connections are all trying to execute those transactions simultaneously, the database’s connection scheduler becomes the choke point.

Why Connection Pooling Can Backfire

The standard recommendation for high-connection loads is a connection pooler like PgBouncer or Pgpool-II. These tools sit between the application and the database, multiplexing many client connections into a smaller set of persistent database connections. In theory, a pool of 50 connections can handle 1,800 client requests by queuing them and reusing database sessions.

In practice, live dealer workloads break connection pooling in three ways.

First, long-lived transactions. A live dealer game session can last hours, and each table’s connection holds a transaction open for the duration of the hand to maintain transactional consistency. Connection poolers that use transaction-level pooling—the most common mode—cannot release a database connection back to the pool until the transaction ends. If all 50 pool connections are occupied by long-running transactions, new clients queue indefinitely.

Second, prepared statements and session state. Live dealer platforms often use prepared statements for repeated queries like “insert card result” or “update player balance.” These statements are stored in the session context. When a pooler returns a connection to the pool, it may reset the session, discarding prepared statements. The application then re-prepares them on every query, adding latency and CPU overhead. Some platforms work around this by using session-level pooling, but that mode effectively recreates the original problem: each client connection maps to a dedicated database connection, defeating the purpose of pooling.

Third, connection poolers introduce their own queueing logic. PgBouncer’s default pool_size is 25 connections per database. At 300 tables, the application must wait for a free slot. The pooler’s queue depth grows, and clients begin to time out waiting for a connection. The result is the same as a database connection stall, but the error message comes from the pooler instead of PostgreSQL.


The Query Pattern That Kills Performance

Beyond connection management, the specific query patterns used in live dealer platforms trigger PostgreSQL’s weakest scaling behavior: row-level locking and deadlock detection.

Lock Contention on Player Balances

Every bet placed on a live dealer table requires a balance deduction on the player account. In PostgreSQL, this is typically an UPDATE on the player_wallet table with a WHERE clause matching the player ID. Because multiple tables may involve the same player—a single user can have bets on blackjack, roulette, and baccarat simultaneously—concurrent UPDATE statements contend for the same row.

PostgreSQL uses row-level locking via Multi-Version Concurrency Control. When two connections attempt to update the same row, one waits until the other commits or rolls back. Under 300 tables, the probability of two transactions hitting the same player row increases. At peak hours with 10,000 active players, the database spends a measurable fraction of its time in lock wait. The pg_locks view shows hundreds of tuple locks held by waiting backends.

This is not a bug. It is the intended behavior of ACID compliance. But it means that as the number of concurrent tables grows, the throughput per player row collapses. The database can process at most one balance update per player at a time, regardless of how many tables that player is connected to.

Deadlock Detection Overhead

Deadlocks occur when two transactions each hold a lock that the other needs. In a live dealer system, this happens when two transactions update the same two rows in different orders—for example, transaction A locks player 1 then player 2, while transaction B locks player 2 then player 1.

PostgreSQL detects deadlocks by running a background process that walks the lock wait graph every deadlock_timeout (default 1 second). At low connection counts, this check is cheap. At 1,800 connections, the lock wait graph grows exponentially. The deadlock detector may take hundreds of milliseconds to traverse the graph, during which all waiting transactions are blocked. Worse, the detector itself holds a lightweight lock on the lock manager data structure, serializing all lock acquisition attempts while the check runs.

Live dealer platforms that report “database timeout” errors every few minutes under load are often hitting deadlock detection delays rather than actual deadlocks. The system is not deadlocked; it is just slow to confirm that no deadlock exists.

Heavy Use of SERIALIZABLE Isolation

Some platforms use SERIALIZABLE isolation level to prevent phantom reads in financial transactions. PostgreSQL implements serializable isolation using Serializable Snapshot Isolation, which tracks read-write dependencies and aborts transactions that would violate serial order. The tracking overhead scales with the number of concurrent transactions, not just the number of rows touched. At 300 tables, the SSI machinery can cause transaction abort rates of 5–10%, forcing the application to retry. Each retry consumes more connections and more lock manager time, accelerating the stall.

A 2023 analysis of a major European live dealer provider’s database logs showed that switching from SERIALIZABLE to REPEATABLE READ reduced transaction abort rates from 8% to 0.3% and increased sustained throughput by 40%. The trade-off is a theoretical risk of write skew, but in practice, live dealer balance updates are simple enough that write skew is nearly impossible.


Hardware Misconfiguration and the Shared Buffer Trap

Even with perfect query optimization, PostgreSQL’s default memory settings fail under live dealer loads. The most common mistake is setting shared_buffers too high or too low.

The Shared Buffer Threshold

shared_buffers defines how much memory PostgreSQL reserves for caching data blocks. The recommended value is 25% of total system RAM. On a server with 64 GB RAM, that is 16 GB. But here is the problem: PostgreSQL also uses the operating system’s page cache for the same data. When shared_buffers is too large, the database spends more time scanning its own buffer pool for free slots, and the OS page cache becomes redundant. When shared_buffers is too small, the database relies on the OS cache, but each buffer miss requires a system call to read from disk, which is orders of magnitude slower than memory access.

Live dealer workloads are write-heavy, not read-heavy. The database writes game results and balance updates far more often than it reads historical data. In a write-heavy workload, the buffer pool fills quickly with dirty pages that must be flushed to disk by the background writer. If shared_buffers is set to 16 GB, the background writer must scan and flush a large pool, causing periodic write spikes that block readers.

A study of PostgreSQL performance under write-heavy loads published in the VLDB Journal found that setting shared_buffers to 10–15% of RAM—not 25%—reduced write latency variance by 60% in transactional workloads. For live dealer, a 64 GB server should use roughly 6–8 GB for shared_buffers, not 16 GB.

WAL Configuration and Checkpoint Storms

PostgreSQL writes every transaction to the Write-Ahead Log before applying it to the data files. The WAL is stored in a ring buffer that is flushed to disk at checkpoints. The default checkpoint_completion_target is 0.5, meaning the checkpoint should complete halfway between two checkpoints. Under high write volume, checkpoints can cause I/O storms that stall all database activity for seconds at a time.

At 150 transactions per second, PostgreSQL writes roughly 1.5 MB of WAL per second. With a max_wal_size of 1 GB, a checkpoint occurs every 11 minutes. Each checkpoint flushes all dirty buffers to disk. On a system with 8 GB of dirty buffers, that flush can take 20–30 seconds, during which all write transactions are blocked. For a live dealer table, a 30-second stall means the dealer cannot resolve bets, cards cannot be dealt, and players see a spinning wheel.

The fix is to increase max_wal_size to 8 GB or more, reducing checkpoint frequency, and to set checkpoint_completion_target to 0.9, spreading the write load over a longer period. Many operators resist this because it increases recovery time after a crash, but the trade-off is acceptable for a system that cannot tolerate even short stalls.


The Unanswered Question: Is PostgreSQL the Wrong Tool?

PostgreSQL is a general-purpose relational database designed for consistency, not for the specific profile of live dealer gaming: high connection counts, long-running transactions, row-level lock contention, and write-heavy throughput. The industry has largely accepted that live dealer platforms must run on PostgreSQL because it is free, well-understood, and used for the rest of the casino back end. But the evidence suggests that a 300-table live dealer deployment is pushing PostgreSQL past its designed limits, even with every tuning trick applied.

Alternatives exist. CockroachDB and YugabyteDB distribute write load across nodes, reducing lock contention. FoundationDB offers strict serializability with a different contention model. Even a move to MySQL with the InnoDB engine might improve write throughput under high connection counts, because MySQL uses a thread pool per connection rather than a process pool, reducing context switching overhead. But each alternative introduces its own trade-offs: CockroachDB’s latency spikes under geo-distribution, FoundationDB’s limited SQL compatibility, MySQL’s weaker replication guarantees.

The open question is whether the iGaming industry will continue to treat database performance as an ops problem to be solved with more RAM and bigger connection pools, or whether it will acknowledge that the live dealer vertical has outgrown PostgreSQL’s sweet spot. As platforms push toward 500 and 1,000 concurrent tables—and some already do in Asia and Europe—the connection stall that appears at 300 tables today will become a hard ceiling tomorrow. The next generation of live dealer infrastructure may need to be built on a database that was designed for this workload from the start, or the stall will simply move to a higher number.