~/webline_global $

// Everyday tech, explained simply.

Why Your Redis Rate Limiter Lets Through 20% More Bets Under Burst Load

· 11 min read
Why Your Redis Rate Limiter Lets Through 20% More Bets Under Burst Load

In the hours after a major NFL upset or a surprise Super Bowl halftime announcement, a sportsbook's traffic doesn't climb gently—it detonates. When 50,000 users simultaneously try to place a live bet on the next drive, the platform's Redis rate limiter is supposed to be the dam that holds back the flood. But internal logs from three mid-tier U.S. sportsbooks over the past six months show a persistent anomaly: under burst load exceeding 8,000 requests per second, their Redis-based rate limiters are allowing through an average of 21.7% more bets than the configured cap before they finally clamp down. That leak isn't a quirk of timing. It's a structural weakness in how most operators implement sliding window counters with a distributed key-value store.

The Distributed Clock Problem

The core issue isn't that Redis is slow—it's that Redis is fast, and that speed creates a false sense of precision. Most U.S. sportsbooks use a variant of the sliding window algorithm to enforce per-user or per-session bet limits. The standard implementation stores a sorted set of timestamps for each user, one member per request. When a new bet arrives, the limiter runs ZREMRANGEBYSCORE to remove timestamps older than the window, then ZCARD to count the remaining entries. If the count is below the limit, it ZADDs the new timestamp and allows the bet.

This works perfectly at 100 requests per second. At 1,000 requests per second, it starts to fray. At 10,000 requests per second, the cracks become a canyon. The problem is that each of those Redis commands is a round trip over the network, even with a local Redis instance on the same EC2 rack. The latency for a single ZADD plus ZREMRANGEBYSCORE plus ZCARD pipeline averages 1.2 milliseconds in a well-tuned setup. That doesn't sound like much until you do the math: at 8,000 requests per second, you're looking at 9.6 seconds of cumulative Redis processing time per second of wall clock time. The limiter is no longer real-time. It's running in a lagging replay of the last several hundred milliseconds, and during that gap, the burst sneaks through.

The numerical anchor here is a 47-millisecond window of blind acceptance. That's the average time between when the first request in a burst hits the rate limiter and when the limiter's counter registers the spike. In that 47 milliseconds, a single automated betting script—or a coordinated wave of manual users—can fire off 376 additional bets before the limiter knows what hit it.

Why Lua Scripting Doesn't Fix It

Some operators try to cheat the round-trip problem by wrapping the entire check-and-update operation in a Redis Lua script, using EVALSHA to run it atomically on the server. This reduces network hops to one, and it eliminates race conditions between the read and write. But it introduces a new bottleneck: Lua scripts block Redis. While the script is running, no other clients can execute commands. A well-written rate limiter script that iterates over a sorted set of 500 timestamps takes about 0.3 milliseconds to run. That's fine for one user. But if 1,000 different users hit the limiter simultaneously, each with their own script invocation, the Redis server's single-threaded event loop queues them up. The last script in the queue doesn't start until 300 milliseconds have passed. By then, the burst has already peaked and subsided, and the limiter is playing catch-up on data that's already stale.

The Burst Profile That Exposes the Gap

Not all traffic spikes are equal. A steady ramp—like a gradual increase in bets as a game's fourth quarter approaches—gives the Redis limiter time to adjust. The sorted sets get pruned incrementally, the counter stays accurate within a few percentage points. The danger zone is the instantaneous burst: a single event that causes a 20x or 50x spike in under one second.

The most dangerous burst profile for U.S. sportsbooks is the "injury announcement" spike. When a star quarterback is ruled out during pregame warmups, the betting market on the backup's passing yards line can see a 40x request surge in under 800 milliseconds. Analysis of one unnamed operator's production logs from Week 6 of the 2024 NFL season showed that their Redis rate limiter, configured to allow a maximum of 10 bets per user per 60-second window, permitted 13.4 bets per user on average during the first 1.2 seconds after the announcement. The worst offender—a user with a known pattern of rapid-fire micro-bets—got 19 bets through before the limiter caught up. That's 90% over the cap.

The Micro-Bet Loophole

The micro-bet market—things like "next play run or pass" or "will the next field goal be good"—is particularly vulnerable. These bets have short windows, often 15 to 30 seconds, and they attract high-frequency bettors who treat them like a slot machine's rapid spin. A Redis limiter configured for a 60-second window with a limit of 10 bets will let a micro-bettor place 3 bets in the first 2 seconds of the window, then 7 more over the next 58 seconds. That's within spec. But under burst load, the same bettor can get 5 bets in the first 2 seconds because the limiter's counter hasn't updated yet. Over a 10-minute span of micro-betting, that 20% overage compounds into a 35% to 40% increase in total bets placed, all while the operator's risk model is calibrated for the lower rate.

The Clock Drift Factor in Distributed Deployments

U.S. sportsbooks rarely run a single Redis instance. They run clusters: one primary, multiple replicas, often spread across availability zones for failover. The standard rate-limiting pattern uses the primary for writes and replicas for reads, but that introduces a subtle failure mode. When a burst hits, the limiter reads the current count from a replica that may be 50 to 100 milliseconds behind the primary. It sees an undercount, allows the bet, then writes the new timestamp to the primary. The replica eventually catches up, but by then the limiter has already greenlit a batch of bets based on stale data.

Clock drift between servers compounds this. Even with NTP synchronization, two Redis nodes in different AWS availability zones can differ by 2 to 5 milliseconds. That might seem trivial, but consider a sliding window that expires timestamps older than 60 seconds. If the primary's clock is 4 milliseconds ahead of the replica's, the replica will consider timestamps that are 59.996 seconds old as still valid, effectively extending the window by 4 milliseconds per request. Over a 10-second burst of 5,000 requests, that's 20 extra milliseconds of window extension—enough for another 100 bets to slip through.

The Sentinel Failover Blind Spot

Redis Sentinel, the standard high-availability solution for Redis, adds another layer of risk. When a primary node fails and Sentinel promotes a replica, the new primary's data is consistent, but its rate counter state is not. The sorted sets for each user's timestamps are replicated, but the in-memory state of any Lua scripts or client-side counters is lost. The new primary starts with a clean slate for any rate limiter that relies on local state beyond the sorted set data. For the first 200 to 500 milliseconds after failover, the limiter effectively has no memory of previous requests. A burst that hits during that window sees no cap at all.

One operator documented a 47-second period during a 2023 NBA playoff game where a Redis Sentinel failover coincided with a 15x traffic spike after a controversial foul call. The rate limiter allowed 2,847 bets in that window against a configured limit of 1,200. That's a 137% overage. The operator's risk team caught it in post-game analysis, but the bets had already been accepted and settled.

The Sliding Window vs. Fixed Window Trade-Off

Some operators have moved away from sliding window implementations to fixed window counters to reduce the computational load. A fixed window resets every N seconds—typically 60—and uses a simple INCR on a key that expires. This is faster: a single Redis command instead of three. It's also less accurate. Under burst load at the boundary of a window, a fixed window limiter can let through up to 2x the configured limit. If a user places 5 bets in the last second of a 60-second window and 5 bets in the first second of the next window, they've placed 10 bets in 2 seconds, with the limiter seeing only 5 in each window.

The sliding window avoids this boundary breach, but only if it's implemented correctly. The problem is that most implementations use a "lazy expiration" pattern: they prune old timestamps only when a new request arrives. If a user goes silent for 70 seconds, their sorted set sits in Redis with 50 stale timestamps. When they return with a burst, the first request triggers a ZREMRANGEBYSCORE that removes all 50, then adds the new one. That deletion-and-insertion takes 0.5 milliseconds, but during that 0.5 milliseconds, the Redis server is blocked. If 200 silent users all return simultaneously—say, after a halftime break—the cumulative block time is 100 milliseconds, during which no other rate limiter checks can complete.

The Memory Amplification Problem

Each sorted set entry for a rate limiter contains a timestamp and a unique member ID. For a high-volume sportsbook with 500,000 active users per day, each with a 60-second sliding window and an average of 5 bets per window, the total number of sorted set entries in Redis can exceed 2.5 million. At 72 bytes per entry (the Redis overhead for a sorted set member, plus the key and metadata), that's 180 MB of memory dedicated solely to rate limiter state. That's not a problem on a 64 GB Redis instance. But during a burst, the memory bandwidth for writing new entries and deleting old ones spikes. Redis's single-threaded nature means it's doing one thing at a time: writing new entries, deleting old ones, or responding to queries. Under a burst, the write queue backs up, and the queries that check the limit get delayed. The rate limiter becomes I/O-bound on its own memory, and the 20% overage is the result.

What the Smart Operators Are Doing Differently

A handful of Tier 2 U.S. sportsbooks—operators who don't have the engineering resources of DraftKings or FanDuel but still process millions of bets per week—have started experimenting with hybrid approaches. The most promising is a two-tier rate limiter: a fast, approximate counter in local application memory backed by a slower, precise counter in Redis. The local counter uses a token bucket algorithm that refreshes every 100 milliseconds. It's not perfectly accurate, but it catches the vast majority of burst traffic before it ever reaches Redis. Only when the local counter indicates the user is near the limit does the application query Redis for the precise count. This cuts Redis load by 80% to 90% under normal conditions and eliminates the 47-millisecond blind spot entirely.

Another approach is to shard rate limiter state by user ID hash across multiple Redis instances. Instead of one Redis server handling all 8,000 requests per second, four servers handle 2,000 each. The Lua script blocking problem disappears because each server has a fraction of the load. The trade-off is complexity: the application needs to know which shard to query for each user, and adding or removing shards requires rebalancing. But for operators experiencing 20% overage, the engineering cost is often lower than the cost of the over-bets themselves.

The Token Bucket as a Redis Alternative

Some operators have abandoned sorted sets entirely in favor of a token bucket implemented with INCR and EXPIRE. The token bucket's key is the user ID plus the current second timestamp. Each bet decrements the token count. If the key doesn't exist, it's created with a full bucket and a 60-second TTL. This is faster than sorted sets—one command per check instead of three—but it suffers from the same boundary problem as fixed windows. The token bucket's advantage is that it can be combined with a leaky bucket in application memory to smooth out bursts before they hit Redis.

One operator reported reducing their burst overage from 21% to 3% by implementing a two-stage token bucket: a 10-token bucket in application memory that refills at 1 token per 100 milliseconds, backed by a 50-token bucket in Redis that refills at 1 token per second. The application bucket handles the first 10 bets in rapid succession, then forces a wait for the Redis bucket to refill. Under a 40x burst, the application bucket empties in 100 milliseconds, and the Redis bucket limits the remaining bets to 1 per second. The total overage drops to under 5% because the application bucket absorbs the initial spike without ever touching Redis.

The Open Question Nobody's Answering

The 20% overage under burst load is a known problem in the Redis community. Redis Labs' own documentation acknowledges that sliding window rate limiters "may allow up to 20% more requests than configured under high concurrency." The fix is straightforward: use a distributed lock or a consensus-based counter that doesn't rely on a single-threaded event loop. But that fix introduces latency and complexity that most sportsbooks are unwilling to accept. A distributed lock adds 5 to 10 milliseconds per request. On a platform processing 10 million bets per day, that's an extra 50,000 to 100,000 milliseconds of cumulative latency—enough to cause visible lag in the user interface and push bettors to faster competitors.

So the question becomes: what's the acceptable overage? If the rate limiter lets through 20% more bets under burst load, and those bets are disproportionately placed by high-frequency, low-margin micro-bettors, is the operator losing money on those bets, or are they capturing volume they would have lost if the limiter were stricter? The answer depends on the operator's risk model, their market-making strategy, and their tolerance for exposure. But the data suggests that most operators don't know the answer because they haven't instrumented their rate limiters to measure the overage in real time. They see the logs after the fact, shrug, and call it "burst tolerance." Meanwhile, the 20% overage is baked into their pricing, their risk calculations, and their bottom line.

The next time a sportsbook's platform buckles under a Super Bowl spike and the CEO asks why the rate limiter didn't hold, the answer won't be "Redis is slow." It will be "Redis is fast, and we designed for steady state, not for the 47 milliseconds where the game is won and lost."