Alan West

Posted on May 15

Why your Node.js memory keeps climbing in production (and how to find the leak)

#node #javascript #performance #debugging

The 3 AM page nobody wants

Last month I got woken up by PagerDuty at some ungodly hour because one of our Node services was eating memory like it owed us money. Restart, it goes back to 200MB. Two hours later, 1.4GB. Restart. Repeat. You know the drill.

I've debugged this exact pattern across maybe a dozen Node projects now, and the root cause is almost never what you'd expect. So let's walk through how to actually find the leak instead of just bumping your container memory limit and calling it a day (which, yes, I have absolutely done before).

What a real memory leak looks like

First, a quick sanity check. Memory growing isn't always a leak. V8 is lazy about garbage collection — it'll happily let your heap grow if there's no pressure. A real leak looks like this:

Memory grows steadily under load
It doesn't drop after GC pauses
It survives traffic dips
Restarting fixes it (temporarily)

If your memory just plateaus high but stops climbing, you probably don't have a leak — you have a hungry workload. Different problem.

Step 1: Confirm it's actually leaking

Before you go hunting, get hard numbers. Drop this into a route or a setInterval:

// simple memory probe — log every 30s
setInterval(() => {
  const m = process.memoryUsage();
  console.log({
    rss: (m.rss / 1024 / 1024).toFixed(1) + 'MB',      // total process memory
    heapUsed: (m.heapUsed / 1024 / 1024).toFixed(1) + 'MB',
    heapTotal: (m.heapTotal / 1024 / 1024).toFixed(1) + 'MB',
    external: (m.external / 1024 / 1024).toFixed(1) + 'MB' // Buffers, native bindings
  });
}, 30000);

Let it run for an hour under realistic traffic. If heapUsed keeps climbing without ever dropping back near baseline, congrats, you have a leak.

Pay attention to external too. I once spent two days hunting a JS heap leak that was actually a Buffer leak in a streaming library — heapUsed looked fine, but rss was through the roof.

Step 2: Take heap snapshots (the right way)

The single most useful tool here is the V8 heap snapshot. Most people grab one snapshot, stare at it, and learn nothing. The trick is to take three:

After warmup (baseline)
After significant load (intermediate)
After more load (final)

Then compare them. Objects that exist in all three and keep growing are your leak.

You can trigger snapshots from inside the app without restarting:

const v8 = require('v8');
const fs = require('fs');

function takeSnapshot(label) {
  // writeHeapSnapshot returns the filename it wrote to
  const filename = `heap-${label}-${Date.now()}.heapsnapshot`;
  const stream = v8.getHeapSnapshot();
  stream.pipe(fs.createWriteStream(filename));
  return filename;
}

// expose behind an internal-only route
app.post('/debug/heap', (req, res) => {
  const file = takeSnapshot(req.query.label || 'manual');
  res.send({ file });
});

Load those .heapsnapshot files into Chrome DevTools (Memory tab → Load). Switch the view to Comparison and pick your baseline snapshot as the reference. Sort by # Delta. The stuff at the top with massive positive deltas is what's leaking.

Step 3: The usual suspects

After doing this enough times, I've seen the same handful of culprits over and over:

Closures holding references

This one bit me hard last year:

// looks innocent. is not.
function createHandler(bigConfig) {
  const cache = new Map();
  return function handler(req) {
    cache.set(req.id, bigConfig); // bigConfig captured forever
    return cache.get(req.id);
  };
}

The cache Map grows unbounded, and every entry pins bigConfig in memory. If createHandler runs per-connection, you're cooked.

Event emitters with no cleanup

// every request adds a listener — never removed
emitter.on('data', (chunk) => processChunk(req, chunk));

Node will actually warn you about this once you hit 11 listeners by default, but a lot of folks just bump setMaxListeners and move on. Don't. Either use once() or explicitly call removeListener in your cleanup path.

Timers and intervals

// started, never cleared
setInterval(() => pollSomething(userId), 1000);

If the surrounding scope captures anything heavy and the interval is never cleared, that scope lives forever. Always store the handle and clearInterval it on disconnect/shutdown.

Global caches without bounds

The classic. Someone writes const cache = {} at the top of a module, starts stuffing things into it, and forgets that the module is a singleton. Use an LRU cache with a real size limit — lru-cache on npm is the standard choice and handles eviction for you.

Step 4: Reproduce locally with a tight loop

Once you suspect a specific code path, isolate it. I usually write a tiny script that hammers the suspect function in a loop and watches heap growth:

const { suspectFunction } = require('./src/lib/whatever');

async function main() {
  for (let i = 0; i < 100000; i++) {
    await suspectFunction({ id: i, payload: 'x'.repeat(1000) });
    if (i % 10000 === 0) {
      // force GC if running with --expose-gc
      if (global.gc) global.gc();
      const used = process.memoryUsage().heapUsed / 1024 / 1024;
      console.log(`iter=${i} heap=${used.toFixed(1)}MB`);
    }
  }
}
main();

Run it with node --expose-gc leak-test.js. If heap keeps climbing even after forced GC, you've reproduced the leak in isolation. Now you can iterate fast.

Prevention: habits that save you later

A few things I do on every Node service now, more or less reflexively:

Bound every cache. If it's a Map or object used as a cache, it needs an eviction policy. No exceptions.
Pair every addListener with a removeListener. Same with setInterval/clearInterval. Cleanup is part of the feature, not an afterthought.
Emit memory metrics to your observability stack. process.memoryUsage() should be a regular metric, not something you only look at during incidents.
Set --max-old-space-size explicitly. Don't rely on defaults — pick a number based on your container limit so V8 actually tries to GC under pressure.
Load-test before shipping. A 30-minute soak test under realistic traffic catches most leaks before they catch you.

None of this is glamorous. But the next time you get paged at 3 AM, you'll have a playbook instead of a guess. And honestly, once you've done the heap-snapshot-comparison dance a few times, it stops feeling like dark magic and starts feeling like just another debugging step.

DEV Community