Unbounded Queue: Memory Death

The system is green. All health checks pass. CPU is idling at 30%. Your on-call engineer is halfway through a coffee. Then the OOM killer wakes up, picks your most critical process, and silently executes it. No warning. No graceful shutdown. Just a dead PID and a postmortem nobody wanted to write. Welcome to the unbounded queue disaster — the failure mode that hides in plain sight behind it works fine under normal load.

The Fallacy of Infinite Buffers

Heres the mental model most developers carry without realizing it: a queue is a safety net. You put work in when the system is busy, it drains when things calm down. Reasonable, right? Wrong — and the wrongness is subtle enough to kill a production system three years after the code was written. The safety net metaphor only holds when two conditions are met simultaneously: the consumer can eventually outpace the producer, and eventually actually arrives. Strip either condition away and the buffer isnt a net. Its a pit.

The math is brutally simple. If your arrival rate λ exceeds your service rate μ — even by a small margin, even intermittently — and your buffer has no upper bound, queue depth grows without limit. Not might grow. Grows. Its not a probability, its a theorem. Littles Law doesnt care about your SLA or your architecture diagram. It just counts.

# Python. Looks innocent. Ships to prod every day.
import asyncio

queue = asyncio.Queue()  # maxsize=0 — default, unbounded

async def producer():
    while True:
        await queue.put(await fetch_next_event())  # never blocks

async def consumer():
    while True:
        item = await queue.get()
        await process(item)  # what if this gets slow?

What This Code Is Actually Saying

The producer above will never pause. It doesnt know how full the queue is — it doesnt ask. The consumer processes one item at a time, and if process() hits an external API that starts responding in 800ms instead of 80ms, the queue depth climbs silently. No exception. No alarm. The heap just starts filling up. By the time your memory alert fires at 85% usage, youre already in damage control mode, not prevention mode. This isnt a bug in the code — its a bug in the assumption that the consumer will always keep up.

Node.js: Where the Event Loop Becomes the Crime Scene

Node.js deserves its own paragraph here because its single-threaded, non-blocking architecture creates a specific flavor of this disaster. Developers reach for in-memory arrays or EventEmitter-based queues precisely because theyre fast and zero-friction. And they are fast — right up until the moment they arent. A webhook receiver that pushes events into an array for background processing has no natural pressure valve. The event loop keeps accepting connections, keeps pushing to the array, and the consumer — some setInterval loop or a chain of async callbacks — falls behind without any mechanism to signal the upstream to slow down.

What makes this insidious in Node specifically: you wont see thread contention, you wont see lock timeouts. The process just gets slower, then slower, then the V8 heap hits its limit and you get a clean FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. Postmortem logs show nothing unusual until five minutes before the crash.

// Node.js. Classic pattern. Classic trap.
const pendingJobs = [];  // unbounded. always.

app.post('/webhook', (req, res) => {
  pendingJobs.push(req.body);  // O(1) push, feels great
  res.sendStatus(202);
});

setInterval(() => {
  const job = pendingJobs.shift();  // what if push() >> shift()?
  if (job) processJob(job);
}, 100);

The Asymmetry Nobody Draws on the Whiteboard

Push is O(1) and always succeeds. Processing is variable and can slow down for a hundred external reasons — downstream latency, GC pressure, a slow database query. The interval stays at 100ms regardless of how deep the backlog runs. Youve built a system where ingress is unconditionally fast and egress is conditionally slow, with no feedback between them. That asymmetry is the entire problem, drawn in ten lines of JavaScript.

When Default Settings Ship to Production

Every standard library has opinions baked in. Some of those opinions are quietly catastrophic. Javas LinkedBlockingQueue has two constructors — one takes a capacity argument, one doesnt. The one without capacity defaults to Integer.MAX_VALUE. Thats 2,147,483,647 slots. Nobody is filling 2 billion queue entries, sure — JVM will happily allocate heap for every single object you push before that limit, and your GC will start choking long before you hit it. The safe default is a slow bleed dressed up as convenience.

This isnt a Java problem specifically. Its a defaults problem. Defaults optimize for works on first run, not survives a traffic spike at 2am. The library author cant know your load profile. You can. But the path of least resistance is always to leave the constructor argument empty and move on to the next ticket.

// Java. Two constructors. One disaster.

// This one will eat your heap quietly:
BlockingQueue unbounded = new LinkedBlockingQueue<>();

// This one will fail loudly — which is what you want:
BlockingQueue bounded = new LinkedBlockingQueue<>(5_000);

// Loud failure > silent memory death. Every time.
executor.setQueue(bounded); // throws RejectedExecutionException when full
                            // that's not a bug. that's the feature.

Rejection Is a Signal, Not a Failure

When a bounded queue throws RejectedExecutionException, engineers instinctively want to fix it by increasing the limit. Thats the wrong instinct. The rejection is the system telling you something true: your consumer cant keep up with your producer right now. Increasing the buffer silences the signal without fixing the underlying imbalance. Youve just bought yourself more time before the same crash, with a bigger heap dump to analyze when it arrives.

The Webhook Storm: A Case Study in Cascading Assumptions

Third-party webhook delivery is one of the most reliable ways to trigger an unbounded queue disaster in production, because the traffic pattern is adversarial by design. Your integration with a payment processor, a CRM, or a CI platform works fine for months under normal load — maybe 50 events per minute, your internal queue drains in seconds. Then the third party has an outage. It recovers. And now its replaying six hours of backed-up events at full throttle directly at your webhook endpoint, which is still configured to accept everything and ask questions later.

Your queue depth goes from 200 to 80,000 in under four minutes. The consumer pool — sized for steady-state load — is suddenly processing at maximum throughput and still falling behind. Heap climbs. GC cycles get longer. Longer GC cycles mean slower processing. Slower processing means the queue grows faster. Youre now in a feedback loop that only ends one way.

# The webhook receiver that started the fire.
# Stripped down, but this is the pattern.

@app.route('/webhook/events', methods=['POST'])
def receive_event():
    payload = request.get_json()
    internal_queue.put(payload)  # no size check. no backpressure.
    return '', 202              # always 202. always "I got it."
                                # even when you absolutely cannot handle it.

# The third party sees 202.
# It sends the next one immediately.
# And the next. And the next.

The 202 Lie and What to Send Instead

Returning HTTP 202 unconditionally is a promise you cant always keep. A 202 tells the sender I have accepted this, it will be processed — but if your internal queue is already saturated, thats not true. The correct response when your queue is full is 429 Too Many Requests with a Retry-After header. Most well-behaved webhook senders will back off and retry. Youve now implemented backpressure at the HTTP boundary — the simplest, most effective pressure valve available, and the one most systems never bother to wire up.

Backpressure: The Three Choices and Why Two of Them Are Wrong

When your queue starts filling faster than it drains, you have exactly three architectural options. Engineers instinctively reach for the wrong one first, then the second wrong one, and only arrive at the right answer after a production incident or two. Lets skip the tour.

Option one is buffering — just let the queue grow and trust the consumer will eventually catch up. Weve covered why this ends badly. Option two is load shedding — drop incoming work when the system is under pressure. This is sometimes correct, but it requires you to be honest about which data is disposable. Dropping a metrics event is fine. Dropping a payment confirmation is not. Most systems that implement load shedding dont make this distinction explicitly, which means they shed the wrong load at the wrong time.

// Backpressure at the queue boundary.
// The producer blocks instead of overflowing.

const queue = new BoundedQueue(maxSize: 5000);

async function handleIncoming(event) {
  if (queue.isFull()) {
    // Option A — shed load (only if data is disposable):
    metrics.increment('events.dropped');
    return;

    // Option B — propagate pressure (block the producer):
    await queue.waitForSpace();  // producer slows down naturally
  }
  queue.push(event);
}

Propagating Pressure Is the Only Honest Answer

True backpressure means the producer learns about the consumers state and adjusts its rate accordingly. Not through a dropped packet or a crashed thread — through an explicit signal that says slow down, Im not ready. TCP has done this since 1981 with window sizing. Reactive Streams standardized async pipelines in 2014. The concept is not new. Whats new is admitting that your application layer needs the same discipline that the transport layer figured out decades ago.

Implementation: Stop Trusting, Start Bounding

Every queue that accepts work from an external source needs a hard capacity limit. The number doesnt have to be perfect. It has to exist. A queue that rejects work at capacity is honest. A queue with no limit is lying to itself and will eventually pay for it in heap space.

# One argument. That's the entire fix.

import asyncio

QUEUE_CAPACITY = 10_000

queue = asyncio.Queue(maxsize=QUEUE_CAPACITY)

async def producer(event):
    try:
        queue.put_nowait(event)
    except asyncio.QueueFull:
        metrics.increment('queue.rejected')
        raise BackpressureSignal("Queue at capacity")

QueueFull Is a Signal — Not an Error to Suppress

Never swallow QueueFull with an empty except block. Its your infrastructure telling you it hit its designed limit. Log it, meter it, alert on its rate. The moment you silence it, the feedback loop breaks — your producer keeps pushing, metrics stay clean, and the only thing left to tell you somethings wrong is the OOM killer.

Circuit Breakers: Cut the Queue Off From the Blast Radius

A saturated queue and a crashed downstream are different failures that produce the same symptom — work piling up with nowhere to go. A circuit breaker separates these cases. When downstream is unhealthy, the breaker opens: new work is rejected at the boundary instead of filling a queue with tasks that will fail on delivery anyway.

class CircuitBreaker {
  constructor(threshold, resetTimeout) {
    this.state = 'CLOSED'; // CLOSED → OPEN → HALF-OPEN
    this.failures = 0;
    this.threshold = threshold;
    this.resetTimeout = resetTimeout;
  }

  async call(fn) {
    if (this.state === 'OPEN') throw new Error('Fast fail');
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }
}

An Open Breaker Is Operational Clarity

When the breaker opens, your downstream gets room to recover without a retry storm. Your queue stops climbing because work that would fail immediately never enters the buffer. And your ops team gets a clean, specific signal instead of something somewhere is broken. Thats not just resource management — thats the difference between a 10-minute incident and a 3-hour war room.

The One Metric That Actually Tells the Truth

Queue size is a vanity metric. A queue at 8,000 items is either perfectly healthy or a catastrophe — depending entirely on how fast the consumer is moving. Size without velocity is noise. Queue age — the timestamp on the oldest unprocessed message — is the metric that doesnt lie.

def check_queue_health(queue, max_age=30):
    oldest = queue.peek_oldest()
    if not oldest:
        return

    age = time.time() - oldest.enqueued_at
    metrics.gauge('queue.oldest_message_age', age)

    if age > max_age:
        alerting.fire(
            severity='critical',
            message=f'Queue age {age:.0f}s — consumer is losing the race',
            context={'depth': queue.size()}
        )

Age Trends Up. Size Lies.

Queue size spikes and recovers. Queue age trending upward is structural — your consumer throughput is below producer rate as a sustained condition, not a blip. By the time size looks alarming, age has been signaling for minutes. Instrument both. Alert on age. Treat size as context.

The System That Knows How to Say No

Every buffer has a number. Every queue has a rejection policy. Every service has a point where it stops accepting work and says — explicitly — not right now. That refusal is the most honest thing a distributed system can do. A system that accepts everything makes promises it cant keep, and the bill arrives at 3am.

Bound your buffers. Instrument queue age. Wire up backpressure before the incident that proves you needed it. The OOM killer doesnt send a warning — it just picks whoever left their queue uncapped.

Written by: