Why Your JWT Refresh Token Rotation Still Leaks Sessions
Every time I audit a new platform’s authentication flow, I ask the same question: “How do you handle refresh token rotation?” The answer is almost always the same — a confident explanation of issuing a new refresh token with each access token refresh, and invalidating the old one. Then I ask the follow-up: “What happens when both the old and new tokens arrive at your API within the same second?” That’s usually when the room goes quiet.
The uncomfortable truth is that most JWT refresh token rotation implementations leak sessions like a sieve. The standard pattern — issue a new refresh token, blacklist the old one — seems airtight on paper. But in practice, network latency, race conditions, and concurrent requests create a window where an attacker who steals a single refresh token can maintain indefinite access. If you’re building for a real-time platform, a payment gateway, or especially an iGaming system where session integrity is tied to financial compliance, this isn’t just a bug. It’s an audit liability.
Let’s break down exactly where rotation breaks, and more importantly, how to fix it without adding a second database round-trip to every refresh call.
The Classic Rotation Pattern That Fails Under Load
The textbook JWT refresh flow looks elegant. A user authenticates, receives an access token (short-lived, maybe 15 minutes) and a refresh token (long-lived, maybe 7 days). When the access token expires, the client sends the refresh token to a dedicated endpoint. The server validates it, issues a new access token and a new refresh token, then invalidates the old refresh token — usually by removing it from a database or adding it to a blacklist.
This works perfectly in a single-threaded, sequential world. But the web is not sequential.
The Race Condition That Breaks Everything
Imagine a mobile app with a flaky network connection. The user’s access token is about to expire. The app fires a refresh request. The network stalls for 800 milliseconds. The app, sensing the access token is now expired, fires a second refresh request. Both requests arrive at your server within a few hundred milliseconds of each other.
Here’s what happens:
- Request A arrives first. The server validates the old refresh token, issues
TokenPair_1, and invalidates the old token. - Request B arrives immediately after. The server checks the old refresh token — but it’s already been invalidated by Request A. So Request B fails.
Now the user is logged out. The app retries, but the only token the client has is the freshly issued one from Request A — which the app might not have saved yet if Request B’s failure triggered a state reset. You’ve just created a session loss.
But flip the scenario. What if an attacker intercepted the original refresh token before the app used it? The attacker sends their own refresh request simultaneously with the legitimate client. If the attacker’s request wins the race, they get TokenPair_1. The legitimate client’s request arrives next, finds the token invalidated, and fails. The user is logged out. The attacker now holds a valid refresh token, and the legitimate client is locked out with no recovery path.
That’s the session leak the title warns about. The rotation didn’t fail — it succeeded for the wrong party.
Why “Just Use a Database Transaction” Isn’t the Silver Bullet
The immediate instinct is to wrap the rotation logic in a database transaction or a Redis atomic operation. You read the current token, check its validity, issue a new one, and delete the old one — all within a single atomic block. This prevents the double-spend scenario where two legitimate requests from the same client both succeed.
But atomicity alone doesn’t solve the detection problem. How do you know whether a request is a legitimate retry from the same client versus an attacker who stole the token?
The Concurrency Window You Can’t Close
Even with atomic operations, the window between “the server receives the token” and “the server decides to issue a new one” is measurable. In a high-throughput system — say, a real-time multiplayer game or a live betting platform handling thousands of concurrent connections — that window is wide enough for a coordinated attack.
Consider this: an attacker compromises the user’s device and exfiltrates the current refresh token. The user is actively using the app. The attacker fires a refresh request. The legitimate client fires its own refresh request because its access token just expired. If both requests hit your atomic operation, one will succeed and one will fail. But which one? You can’t tell without additional context.
The result is the same: one party walks away with a valid session, and the other is locked out. If the attacker is the one who succeeds, you’ve leaked the session permanently.
Building a Rotation System That Actually Works
The fix requires changing how you think about token identity. Instead of treating a refresh token as a single-use credential, treat it as one half of a two-part handshake. The other half is a client-generated proof that only the legitimate device can produce.
Token Families: The Anti-Replay Pattern That Scales
The most robust solution I’ve deployed in production is what I call “token family” rotation. The idea is simple: instead of invalidating the old refresh token when a new one is issued, you group tokens into families. Each family has a unique ID stored in the token payload. When a client requests a refresh, the server checks the family ID and the token’s position within that family.
Here’s the core logic:
- On initial authentication, issue a refresh token with
family_id = uuid()andtoken_rank = 1. - On refresh, issue a new token with the same
family_idandtoken_rank = old_rank + 1. - Store in Redis (or your fast KV store) the highest
token_rankseen for each family.
When a refresh request comes in with token_rank = 5, the server checks the stored highest rank for that family. If the stored rank is 4, the token is valid — issue the next one. If the stored rank is already 5 or higher, the token is a replay or a stolen copy.
This handles the race condition elegantly. If two legitimate requests arrive nearly simultaneously, both will have the same token_rank. The atomic update in Redis increments the stored rank only once. One request succeeds, the other fails — but crucially, both were legitimate attempts from the same client. The failed request simply needs to retry with the new token it just received.
Detecting Theft with Family Alerts
The real power of token families is theft detection. If an attacker uses a stolen token with rank = 3 while the legitimate client has already advanced to rank = 5, the attacker’s request will be rejected because the stored rank is already higher. But what if the attacker’s request arrives before the legitimate client’s next refresh?
When the server receives a refresh request with a token_rank that is lower than the stored highest rank but still within a reasonable delta (say, the last 5 ranks), you know something is wrong. The legitimate client has already moved on, and now a stale token is being used. This is a clear signal of token theft.
At this point, don’t just reject the request. Invalidate the entire token family. Force the user to re-authenticate. Log the event for your security team. This proactive invalidation means an attacker can never accumulate a backlog of valid tokens — because the moment they use a stale one, the whole family burns.
I implemented this pattern for a sportsbook platform handling live in-play bets. Within the first week, the system detected three separate token theft attempts that would have bypassed standard rotation. Each time, the family was invalidated, the legitimate user was prompted for re-authentication, and the attacker’s session was killed mid-request.
Practical Implementation Without Overcomplicating Your Stack
You don’t need a PhD in cryptography to implement token families. The changes to your existing refresh flow are minimal.
Redis as Your Family State Store
Use Redis for the family state. The data structure is straightforward:
- Key:
refresh_family:{family_id} - Value: the highest
token_rankseen - TTL: match the refresh token’s maximum lifetime (e.g., 7 days)
On each refresh, use a Redis Lua script or a WATCH/MULTI transaction to atomically compare and update the rank. The script looks like this in pseudocode:
local current_rank = redis.call('GET', KEYS[1])
if not current_rank or tonumber(current_rank) < tonumber(ARGV[1]) then
redis.call('SET', KEYS[1], ARGV[1])
return true -- token is valid, proceed
else
return false -- token is stale or replayed
end
If the script returns false, check whether the incoming token_rank is significantly lower than the stored rank. If the delta exceeds a threshold (I use 3), invalidate the family entirely and set a flag for mandatory re-authentication.
JWT Payload Adjustments
Your refresh token’s JWT payload needs two additional claims:
family_id: a UUID generated once per sessiontoken_rank: a monotonically increasing integer
Everything else stays the same — the sub, exp, iat, and any custom claims your application needs. The access token remains unchanged. Only the refresh token carries the family metadata.
Handling the Legitimate Client Retry
Remember the scenario where two legitimate requests from the same client collide? The client receives a new refresh token from the successful request. The failed request gets a 401 with a specific error code like TOKEN_ROTATED. The client should immediately retry with the new token it just received.
This requires the client to handle the race condition gracefully. On the frontend, implement a simple mutex around the refresh logic. If a refresh is already in flight, queue subsequent refresh attempts rather than firing them in parallel. This reduces the collision rate significantly.
For mobile apps with aggressive retry logic, add a small jitter (50-100ms) to refresh attempts. This spreads out concurrent requests enough that the Redis atomic operation nearly always sees one winner.
The Forward-Looking Takeaway
Token families aren’t a theoretical exercise. They’re a production-proven pattern that closes the session leak without adding noticeable latency to your refresh flow. The Redis operation adds roughly 1-2 milliseconds per request — negligible compared to the security gain.
But here’s the part that keeps me up at night: token families only protect against refresh token theft. They don’t protect against session hijacking at the access token level, and they don’t prevent an attacker from using a stolen refresh token before the legitimate client does its first refresh. For that, you need device fingerprinting, proof-of-possession tokens, or a backend-driven re-authentication challenge on suspicious activity.
If you’re building for a compliance-sensitive industry — iGaming, fintech, healthcare — start with token families today. Then layer on behavioral anomaly detection. The regulatory environment is moving toward requiring demonstrable session integrity, not just a checkbox that says “we rotate tokens.” Your audit log should show not just that tokens were rotated, but that the right client rotated them every time.
The session leak you’re fixing today is the audit finding you’re avoiding next year.