~/webline_global $

// Everyday tech, explained simply.

Why Your Redis Pub/Sub Messages Arrive Out of Order at Scale

· 9 min read
Why Your Redis Pub/Sub Messages Arrive Out of Order at Scale

You’ve built a real-time feature, and it works beautifully in staging. Then you push to production, traffic spikes, and suddenly messages start arriving in the wrong order. Users see stale data. A leaderboard shows the wrong winner. A chat thread renders replies before the original message. You check your Redis logs and everything looks fine—Pub/Sub channels are active, subscribers are connected. So why is your carefully ordered stream turning into chaos?

The answer is not a bug in Redis. It’s a fundamental misunderstanding of what Redis Pub/Sub guarantees—and what it doesn’t. At scale, the failure isn't in the message broker; it’s in the architecture surrounding it. Let’s break down exactly why this happens and how to fix it before your next deployment.

The False Promise of FIFO in Redis Pub/Sub

Redis is often treated as a magic bullet for real-time messaging because it’s fast. Developers see PUBLISH and SUBSCRIBE commands and assume they work like a queue—first in, first out, every time. That assumption is wrong, and it’s the root cause of most ordering problems.

Pub/Sub Is Not a Queue

Redis Pub/Sub is a fire-and-forget broadcast system. When you publish a message, Redis sends it to all active subscribers on that channel right now. If a subscriber is slow or disconnected, the message is lost. There is no buffering, no acknowledgment, and no ordering guarantee beyond what the network and client library provide.

This is the first major distinction: a queue (like Redis Lists or SQS) holds messages until they are consumed. Pub/Sub does not. It’s designed for low-latency broadcasting, not reliable ordered delivery. If you need strict ordering, you need to design for it explicitly.

Where Ordering Actually Breaks

The ordering breakdown happens at three levels: the publisher, the network, and the subscriber. On the publisher side, if you have multiple processes or threads writing to the same channel, they may interleave messages. Redis processes commands sequentially per connection, but if two connections publish at nearly the same instant, the order depends on which command reaches the server first—which is not deterministic under load.

On the network side, TCP guarantees in-order delivery for a single connection, but it does not guarantee that two separate TCP streams arrive in the order they were sent. If your application uses multiple Redis connections (common in clustered or pooled setups), messages can arrive at the subscriber out of sequence.

On the subscriber side, if your client processes messages asynchronously (e.g., using callbacks or worker threads), a slow handler can cause later messages to be processed before earlier ones. Redis delivers messages to the subscriber in order, but processing order is your responsibility.

The Network and Concurrency Trap

Even if you control the publisher to a single thread, the moment you scale horizontally—multiple app servers, multiple Redis nodes, or even multiple connections—ordering guarantees vanish.

The Single-Connection Illusion

I once consulted for a startup building a live auction platform. They used Redis Pub/Sub to broadcast bid updates to all connected clients. In development, everything was perfect. In production, bids would occasionally appear to jump backward—a $100 bid showing after a $120 bid. The team spent days blaming Redis.

The culprit was their Node.js backend. They had three server instances behind a load balancer, each maintaining its own Redis connection. When a user placed a bid, the request could hit any server. If two bids arrived within milliseconds, server A might publish the first bid, but server B’s network path was slightly faster. The subscriber on the client side received the later bid first. Redis delivered each connection’s messages in order, but it couldn’t order messages across multiple publishers.

This is the single-connection illusion: you assume Redis is a single source of truth for ordering, but with multiple publishers, it’s not. Each publisher connection is an independent stream.

Redis Cluster and Sharded Pub/Sub

If you’re using Redis Cluster, the problem gets worse. Pub/Sub messages in a cluster are broadcast to all nodes, but the ordering across shards is not guaranteed. A message published to a channel on shard A might arrive at a subscriber before a message published to the same channel on shard B, even if shard A’s message was sent later.

Redis Cluster uses a gossip protocol to propagate messages, which introduces variable latency. For most applications, this delay is negligible. But when you need strict ordering, even a few milliseconds of jitter can break your system.

How to Fix Ordering Without Losing Performance

The good news is that you don’t have to abandon Redis Pub/Sub. You just need to layer ordering on top of it. There are three battle-tested approaches, and the right one depends on your scale and tolerance for complexity.

Approach 1: Sequence Numbers on the Publisher

The simplest fix is to attach a monotonic sequence number to every message. The publisher increments a counter (stored in Redis as a key) before publishing. The subscriber then buffers messages and reorders them based on the sequence number before processing.

Here’s the catch: the publisher must guarantee that the sequence number is assigned before the message is published. If you have multiple publishers, you need a centralized counter—usually a Redis INCR command. This introduces a round-trip per message, but it’s fast enough for most use cases.

// Publisher
const seq = await redis.incr('channel:seq:auction-bids');
await redis.publish('auction-bids', JSON.stringify({ bid: 120, seq }));

On the subscriber side, you maintain a buffer (a min-heap or sorted set) and only process messages when you have the next expected sequence number. This approach handles network jitter and multiple publishers, as long as the counter is atomic.

Approach 2: Single Publisher, Fan-Out Architecture

If you can tolerate a single point of ingestion, route all messages through one publisher process. This process receives all events (from your app servers via an internal queue or HTTP endpoint) and publishes them to Redis in a single thread.

This is the architecture used by many real-time leaderboard systems. All bid updates flow through a central “orderer” process that serializes them and publishes in strict sequence. The Redis Pub/Sub layer then broadcasts to all subscribers, which receive messages in the order they were published.

The downside is a single point of failure and a potential bottleneck. But for many small to mid-scale applications, this is simpler than distributed sequence numbers. You can make it resilient by using a primary-replica pattern with a failover mechanism.

Approach 3: Use a Stream Instead of Pub/Sub

If you absolutely need guaranteed ordering and message persistence, stop using Pub/Sub and switch to Redis Streams. Streams are append-only log structures that preserve message order within a stream. Consumers can read from the stream sequentially, and you can group consumers for horizontal scaling while maintaining per-partition ordering.

Redis Streams are not a drop-in replacement for Pub/Sub—they have higher latency and require consumer management. But if your use case demands strict ordering and at-least-once delivery, they are the correct tool.

Here’s a quick comparison:

  • Pub/Sub: Lowest latency, no persistence, no ordering guarantees across publishers.
  • Streams: Moderate latency, persistence, strict ordering within a stream, consumer groups.

For an auction platform, I would use Streams for the bid channel and Pub/Sub for non-critical notifications (like “user joined” events). The trade-off is worth it when a single out-of-order bid can cause a $10,000 error.

The Hidden Cost of Out-of-Order Messages

Beyond user-facing bugs, out-of-order messages can corrupt your data layer. If your subscriber updates a database based on message content, processing messages in the wrong order can write stale values.

The Write Skew Problem

Consider a leaderboard that tracks total points. A subscriber receives two messages: “user A: +10 points” and “user A: -5 points.” If the -5 message arrives first, the subscriber reads the current score, subtracts 5, and writes. Then the +10 message arrives, reads the now-incorrect score, adds 10, and writes. The final score is wrong because the order of operations was reversed.

This is a classic write skew. The fix is to either process messages in strict order or use idempotent operations that don’t depend on current state. For example, instead of “add 10,” send “set score to 105.” But this requires the publisher to know the absolute value, which may not be possible in a distributed system.

Idempotency as a Safety Net

Even with perfect ordering, you should design your message handlers to be idempotent. If a message is delivered twice (which can happen with network retries or reconnections), processing it again should not cause errors.

Include a unique message ID in every payload. On the subscriber side, check if you’ve already processed that ID before applying the update. This doesn’t fix ordering, but it prevents duplicate processing from compounding the problem.

When to Accept Out-of-Order Delivery

Not every feature needs strict ordering. If you’re broadcasting a live feed of stock prices where the latest price is all that matters, out-of-order messages are fine—just discard any update with an older timestamp. If you’re sending chat messages that are displayed in a timeline, a few messages out of order might be acceptable if you sort them client-side by timestamp.

The key is to understand the business cost of misordering. For a live sports scoreboard, showing the wrong score for two seconds is annoying. For a financial trading platform, it’s a regulatory violation. Design your architecture around the strictest requirement, and use simpler patterns for everything else.

A Practical Decision Matrix

  • Chat (non-critical): Accept out-of-order, sort client-side by timestamp. Use Pub/Sub.
  • Leaderboard (critical): Use sequence numbers or single publisher. Pub/Sub with ordering layer.
  • Financial transactions: Use Redis Streams or a dedicated message queue with strict FIFO.
  • Notifications (fire-and-forget): Pub/Sub is fine. Order doesn’t matter.

What’s Coming Next in Real-Time Messaging

The Redis ecosystem is evolving. Redis 7 introduced sharded Pub/Sub, which improves scalability but still doesn’t guarantee cross-shard ordering. The community is pushing for better ordering primitives, and some third-party libraries (like BullMQ) build ordered queues on top of Redis.

But the real shift is toward edge-based architectures. Instead of centralizing all messages through a single Redis instance, systems are moving to per-user or per-room channels with localized ordering. For example, a gaming platform might create a separate Redis channel for each game session, ensuring that all messages within that session are published by a single process. This isolates ordering problems to a single session and makes them trivial to solve.

The future is not about making Redis Pub/Sub ordered—it’s about architecting your system so that ordering is naturally enforced by the topology. If each publisher owns a unique channel or stream partition, ordering is guaranteed within that partition. The challenge becomes managing thousands of partitions, which is exactly what Redis Cluster and Streams are designed to do.

Your Next Move

Before you add another line of code, audit your current Pub/Sub usage. Ask yourself: what happens if two messages arrive in the wrong order? If the answer is “nothing good,” you need a plan.

Start with the simplest fix that matches your scale. For most indie devs and small studios, the single-publisher approach is the easiest to implement and debug. It’s not the most scalable, but it will carry you to tens of thousands of concurrent users without pain. When you outgrow it, migrate to Redis Streams with consumer groups—you’ll get ordering, persistence, and a clear upgrade path.

Don’t wait until a production incident forces your hand. The cost of out-of-order messages isn’t just a bad user experience; it’s lost trust. And in real-time applications, trust is the only currency that matters.