Why Your Node.js Memory Cache Evicts Hot Data First
You refresh a player’s session token, store it in your in‑memory cache with a 30‑minute TTL, and move on. Five minutes later, the user tries to place a bet — and your cache evicts their freshly minted token to make room for a stale leaderboard snapshot that nobody has queried in hours. You just watched your Node.js process throw away the data it needed most, and it did so by design.
This isn’t a bug. It’s the default behavior of V8’s garbage collector interacting with a naive Map-based cache, and it’s costing you real money if you’re running anything latency-sensitive — session management, rate-limit counters, or game-state lookups in a real-time casino backend. Understanding why your cache treats hot keys like yesterday’s trash is the difference between a system that scales and one that melts under load.
The Map‑Based Cache Trap
Most indie developers start with the simplest possible cache: a plain JavaScript Map. It’s built into the language, you can store any key-value pair, and it feels fast. But when you set a TTL and let the Map grow unbounded, you’re setting a trap.
Why Maps Work Until They Don’t
A Map stores references in insertion order. When you iterate over its entries, you get them back in the order you inserted them. That sounds good for a cache — you can delete the oldest entries when the Map gets too large.
The problem is that “oldest” and “least useful” are not the same thing. A player’s session token inserted three minutes ago is far more valuable than a cached API response from ten minutes ago that will never be requested again. But a naive cache evicts the oldest entry first, which is almost always the data that has been sitting around the longest — and that data is often cold precisely because it’s no longer needed. The hot session token, still actively used, gets evicted simply because it was inserted earlier.
The Real‑World Cost of FIFO Eviction
I once consulted on a small live-dealer game platform that used a FIFO (first-in, first-out) cache for WebSocket session state. Every time a player placed a bet, the backend had to look up their session metadata: balance, active game ID, authentication token. The cache held 10,000 entries, and the platform had about 8,000 concurrent players.
On the surface, that’s plenty of room. But during a tournament event, traffic spiked to 12,000 concurrent players. The cache started evicting the oldest 2,000 entries. Those oldest entries were the players who had been logged in the longest — the most active, most valuable users. Every time one of them placed a bet, the cache missed, the backend hit the database, and response time jumped from 2 milliseconds to 80 milliseconds. The players saw a spinner. Some rage-quit. The operator lost real money.
How V8’s Generational GC Makes It Worse
Your Node.js process runs on V8, which uses a generational garbage collector. Young objects (short-lived allocations) are collected quickly in the “nursery.” Old objects that survive a few GC cycles get promoted to the “old generation,” where collection is more expensive and runs less frequently.
Hot Data Lives in the Nursery
Here’s the killer: when you insert a new entry into a Map, the key and value objects are young. They live in the nursery. If your cache is large and the GC runs a scavenge, it has to scan the entire Map to see which entries are still reachable. The Map itself is a single large object in the old generation, but its entries — the key-value pairs — are scattered across the heap.
V8’s GC doesn’t know about your cache’s eviction policy. It only knows about reachability. Every entry in the Map is reachable (you’re still holding a reference to the Map), so none of them get collected by the GC. Your cache grows until it hits your manual size limit, at which point you evict the oldest entry — which is often a hot entry that was inserted early in the process’s lifetime.
The Promotion Problem
When a hot entry survives a few scavenges, V8 promotes it to the old generation. Now it’s harder to collect, and it takes up space that could hold newer hot entries. But you can’t evict it until it becomes the oldest entry in the Map. By the time it reaches that position, it may still be hot — but your eviction policy doesn’t care.
This creates a feedback loop: the cache fills with old, promoted entries that are cold (because they were inserted long ago), while new hot entries get evicted quickly (because they’re young and sit at the “new” end of the insertion order). Your cache becomes a museum of stale data.
LRU Is the Minimum Viable Fix
The standard answer to FIFO eviction is LRU: Least Recently Used. Instead of evicting the oldest entry by insertion time, you evict the entry that was accessed the longest time ago. This directly solves the problem of hot data getting evicted — because if data is hot, it’s being accessed frequently, and its “last used” timestamp stays fresh.
Implementing a Simple LRU in Node.js
You don’t need a third-party library to get started. A basic LRU cache can be built with a Map and a linked list, or you can use a simpler approach: maintain a Map for O(1) lookups and a separate DoublyLinkedList for ordering by access time.
Here’s a minimal implementation that fits in 50 lines:
class LRUCache {
constructor(capacity) {
this.capacity = capacity;
this.cache = new Map();
}
get(key) {
if (!this.cache.has(key)) return undefined;
const value = this.cache.get(key);
// Delete and re-insert to update order
this.cache.delete(key);
this.cache.set(key, value);
return value;
}
set(key, value) {
if (this.cache.has(key)) {
this.cache.delete(key);
} else if (this.cache.size >= this.capacity) {
// Map iterates in insertion order, so .keys().next() gives oldest
const oldestKey = this.cache.keys().next().value;
this.cache.delete(oldestKey);
}
this.cache.set(key, value);
}
}
This leverages the Map’s insertion-order iteration. When you get a key, you delete and re-insert it, moving it to the “newest” position. When the cache is full, you evict the first key from the iterator — which is the least recently used. It’s not perfectly O(1) on eviction (iteration does scan), but for caches up to a few hundred thousand entries, it’s fast enough.
Why LRU Isn’t Perfect for All Workloads
LRU works great when access patterns are consistent. If a user logs in and makes 50 requests in five minutes, their session stays hot. But LRU can fail under scan-heavy workloads. Imagine a bot that iterates through 10,000 player IDs one by one. Each access bumps that player’s entry to the front of the LRU list. By the time the bot finishes, the entire cache has been “refreshed” with cold data, and all the real hot data — active player sessions — has been evicted.
In an iGaming context, this is dangerous. Attackers can trivially flush your cache by sending requests with a sequence of fake user IDs. Your real players get cache misses, and your database takes the hit.
LFU and Hybrid Approaches for Real‑Time Systems
For workloads where access frequency matters more than recency — like session stores, rate-limit counters, or game-state lookups — LFU (Least Frequently Used) eviction is a better fit. LFU tracks how often each entry is accessed and evicts the one with the lowest access count.
Implementing a Frequency‑Aware Cache
LFU is more complex than LRU because you need to maintain frequency counters and handle the case where multiple entries have the same count. A common approach is to use a min-heap of frequency buckets.
But for most indie backends, a simpler hybrid works: use LRU as your base, but add a “frequency boost” that prevents an entry from being evicted if its access count exceeds a threshold. This protects hot data from being flushed by a scan attack.
class FrequencyBoostedLRU extends LRUCache {
constructor(capacity, boostThreshold = 10) {
super(capacity);
this.frequencies = new Map();
this.boostThreshold = boostThreshold;
}
get(key) {
const value = super.get(key);
if (value !== undefined) {
const freq = (this.frequencies.get(key) || 0) + 1;
this.frequencies.set(key, freq);
}
return value;
}
set(key, value) {
if (this.cache.has(key)) {
this.cache.delete(key);
} else if (this.cache.size >= this.capacity) {
// Find the oldest key that is below the frequency boost threshold
for (const [k] of this.cache) {
if ((this.frequencies.get(k) || 0) < this.boostThreshold) {
this.cache.delete(k);
this.frequencies.delete(k);
break;
}
}
// If all entries are boosted, evict the oldest anyway
if (this.cache.size >= this.capacity) {
const oldestKey = this.cache.keys().next().value;
this.cache.delete(oldestKey);
this.frequencies.delete(oldestKey);
}
}
this.cache.set(key, value);
this.frequencies.set(key, 0);
}
}
This isn’t production-grade, but it illustrates the principle: you can layer frequency awareness on top of LRU to protect hot data from both FIFO eviction and scan-based cache flushing.
When to Use LFU Over LRU
Use LFU or a frequency-boosted LRU when your data has a power-law access distribution — a small number of keys get the majority of requests. That’s exactly the pattern you see in session stores (a few hundred active users at any time) and rate-limit counters (a handful of API keys make most requests).
Avoid LFU when access patterns are uniform or when you need to react quickly to changes in popularity. LFU has “memory” — a key that was hot an hour ago but is now cold will keep its high frequency count and stay in the cache, blocking newer hot keys.
Practical Patterns for Node.js Production Caches
No single eviction policy works for every workload. The key is to understand your access pattern and choose accordingly. Here are three patterns I’ve used in production iGaming backends.
Pattern 1: Two‑Tier Caching for Sessions
Keep a small, fast LRU cache (1,000 entries) for the hottest sessions, and a larger, slower LFU cache (10,000 entries) for everything else. The LRU catches rapid re-accesses during a single user session; the LFU protects against scan attacks.
class TwoTierCache {
constructor() {
this.hot = new LRUCache(1000);
this.warm = new FrequencyBoostedLRU(10000);
}
get(key) {
let value = this.hot.get(key);
if (value !== undefined) return value;
value = this.warm.get(key);
if (value !== undefined) {
this.hot.set(key, value);
}
return value;
}
set(key, value) {
this.hot.set(key, value);
this.warm.set(key, value);
}
}
The warm cache uses frequency boosting so a scan attack can’t evict genuine hot data. The hot cache is small enough that eviction is cheap, and it catches the bursty access pattern of a single user making many requests in quick succession.
Pattern 2: TTL‑Based Purging Over Size Limits
Instead of capping the cache by entry count, cap it by total memory usage. Use process.memoryUsage() to check the heap size, and evict entries that are both old and cold when the heap exceeds a threshold. This prevents your cache from causing an OOM crash.
class MemoryAwareCache {
constructor(maxHeapMB = 200) {
this.maxHeap = maxHeapMB * 1024 * 1024;
this.cache = new Map();
this.accessTimes = new Map();
}
set(key, value) {
this.cache.set(key, value);
this.accessTimes.set(key, Date.now());
this.evictIfNeeded();
}
get(key) {
if (this.cache.has(key)) {
this.accessTimes.set(key, Date.now());
return this.cache.get(key);
}
return undefined;
}
evictIfNeeded() {
const used = process.memoryUsage().heapUsed;
if (used < this.maxHeap) return;
// Evict the 10% oldest entries that haven't been accessed in 5 minutes
const cutoff = Date.now() - 300000;
const entries = [...this.accessTimes.entries()]
.filter(([, time]) => time < cutoff)
.sort((a, b) => a[1] - b[1]);
const toEvict = entries.slice(0, Math.ceil(this.cache.size * 0.1));
for (const [key] of toEvict) {
this.cache.delete(key);
this.accessTimes.delete(key);
}
}
}
This is more complex, but it’s the only way to guarantee your cache doesn’t eat all available memory. In a real-time platform handling thousands of concurrent WebSocket connections, memory pressure is the number one cause of unplanned downtime.
Pattern 3: Use a Proven Library for Production
Don’t write your own cache for production. Libraries like lru-cache (by Isaac Z. Schlueter, the creator of npm) have been battle-tested for years, handle edge cases around TTLs, max sizes, and memory limits, and are optimized for V8’s GC behavior.
import LRU from 'lru-cache';
const cache = new LRU({
max: 50000,
ttl: 1000 * 60 * 30, // 30 minutes
dispose: (value, key) => {
// Clean up WebSocket connections or DB handles
value.close?.();
},
});
The library uses an internal linked list and handles the promotion/demotion logic in a way that minimizes GC pressure. It also supports noDisposeOnSet, fetchMethod for cache-aside pattern, and sizeCalculation for memory-based limits. There’s no reason to roll your own unless you’re doing something very unusual.
The Forward‑Looking Note
The Node.js ecosystem is moving toward more sophisticated caching primitives. The node:cache built-in module (experimental as of Node 22) exposes a Cache class that supports multiple eviction policies and can be shared across worker threads. V8’s upcoming “pointer compression” and “isolate” improvements will reduce the memory overhead of large Maps, making in-process caching more viable for high-throughput systems.
But the real shift is toward off‑process caching. Redis, KeyDB, and Dragonfly are purpose-built for this workload and handle eviction policies at the database level. If your Node.js process is holding more than a few thousand entries in memory, you’re probably better off moving that responsibility to a dedicated cache server. Your in‑process cache should only be a small, fast L1 layer — think hundreds of entries, not tens of thousands.
The next time you see a cache miss on a hot key, don’t blame the GC. Blame your eviction policy. Then fix it.