DEV Community

Cover image for Node.js Performance at the Limit: Profiling, Fixing, and Proving It with Real Numbers
Temitope
Temitope

Posted on

Node.js Performance at the Limit: Profiling, Fixing, and Proving It with Real Numbers

Most Node.js performance content teaches you to avoid eval, use streams instead of buffers, and "don't block the event loop." That's fine advice — and it won't help you when your p99 latency is 2.3 seconds and your CTO is in your Slack DMs.

This is a different kind of article. We start with a realistic API that has real problems, profile it properly, fix each bottleneck with actual code, and measure the delta at every step. No platitudes. Numbers or it didn't happen.


The Benchmark Harness First

Before touching a single line of application code, establish your measurement baseline. Every optimization you make needs a before and after. Without this, you're just guessing with extra steps.

We'll use autocannon for HTTP benchmarking and Node's built-in --prof flag plus Chrome DevTools for CPU profiling.

npm install -g autocannon clinic
Enter fullscreen mode Exit fullscreen mode

The baseline test we'll run throughout:

# 10 seconds, 50 concurrent connections, pipe results to JSON
autocannon -c 50 -d 10 -j http://localhost:3000/api/reports > baseline.json
Enter fullscreen mode Exit fullscreen mode

A helper script to diff two runs:

// scripts/compare.js
const before = require('./baseline.json');
const after  = require('./optimized.json');

const metrics = ['requests', 'latency', 'throughput'];

for (const m of metrics) {
  const b = before[m];
  const a = after[m];
  const deltaAvg = (((a.average - b.average) / b.average) * 100).toFixed(1);
  console.log(`${m}.average: ${b.average}${a.average} (${deltaAvg}%)`);
}
Enter fullscreen mode Exit fullscreen mode

Run this after every change. Keep all your JSON files. You'll need receipts.


The Patient: A Realistic Slow API

Here's the kind of endpoint that exists in every codebase that's survived long enough. It generates a report — fetches some data, processes it, formats it, returns JSON.

// src/routes/reports.js  (the before — intentionally broken)
const express = require('express');
const db      = require('../db');       // Postgres via pg
const crypto  = require('crypto');
const router  = express.Router();

router.get('/api/reports', async (req, res) => {
  const { org_id, start, end } = req.query;

  // Fetch orders
  const orders = await db.query(
    `SELECT * FROM orders WHERE org_id = $1
     AND created_at BETWEEN $2 AND $3`,
    [org_id, start, end]
  );

  // For each order, fetch its line items separately
  const enriched = [];
  for (const order of orders.rows) {
    const items = await db.query(
      `SELECT * FROM line_items WHERE order_id = $1`,
      [order.id]
    );

    const total = items.rows.reduce((sum, i) => sum + i.price * i.qty, 0);

    // Compute a "fingerprint" for cache busting downstream
    const fingerprint = crypto
      .createHash('sha256')
      .update(JSON.stringify(order))
      .digest('hex');

    enriched.push({ ...order, items: items.rows, total, fingerprint });
  }

  // Sort by total descending
  enriched.sort((a, b) => b.total - a.total);

  res.json({ data: enriched, count: enriched.count });
});
Enter fullscreen mode Exit fullscreen mode

If you've been around long enough, you felt something in your chest reading that. Let's quantify the pain.

Baseline numbers (50 concurrent, 10s, 200 orders in the result set):

Requests/sec:  47.3
Latency avg:   1,041ms
Latency p99:   2,380ms
Throughput:    1.1 MB/s
Enter fullscreen mode Exit fullscreen mode

Four things are wrong. We'll fix them in order of impact.


Problem 1: The N+1 Query

The for...of loop that fires a db.query per order is the worst offender. With 200 orders, that's 201 round trips to Postgres. Each one waits for the previous to complete because await inside a for loop is sequential.

Proof First

node --require ./src/db-logger.js src/index.js &
autocannon -c 1 -d 3 http://localhost:3000/api/reports 2>/dev/null
Enter fullscreen mode Exit fullscreen mode
// src/db-logger.js  — count queries per request
let count = 0;
const { Pool } = require('pg');
const originalQuery = Pool.prototype.query;
Pool.prototype.query = function(...args) {
  count++;
  process.stdout.write(`\rQueries this process: ${count}`);
  return originalQuery.apply(this, args);
};
Enter fullscreen mode Exit fullscreen mode

Output confirms: 201 queries per request. At 5ms average round-trip, that's 1,005ms of pure waiting before any processing begins.

The Fix: JOIN Everything

// One query, zero loops
const result = await db.query(
  `SELECT
     o.*,
     json_agg(
       json_build_object(
         'id',    li.id,
         'price', li.price,
         'qty',   li.qty,
         'sku',   li.sku
       )
     ) AS items,
     SUM(li.price * li.qty) AS total
   FROM orders o
   JOIN line_items li ON li.order_id = o.id
   WHERE o.org_id = $1
     AND o.created_at BETWEEN $2 AND $3
   GROUP BY o.id
   ORDER BY total DESC`,
  [org_id, start, end]
);
Enter fullscreen mode Exit fullscreen mode

json_agg builds the nested items array directly in Postgres. The SUM computes the total in SQL, skipping the JS reduce entirely. One round trip.

After fix 1:

Requests/sec:  312.4   (+560%)
Latency avg:   158ms   (-85%)
Latency p99:   401ms   (-83%)
Enter fullscreen mode Exit fullscreen mode

That's your N+1. Find it, kill it, collect your 5x improvement.


Problem 2: CPU Blocking — The Fingerprint Loop

With the database bottleneck gone, the CPU profile becomes readable. Let's generate one:

node --prof src/index.js &
autocannon -c 50 -d 10 http://localhost:3000/api/reports
kill %1
node --prof-process isolate-*.log > profile.txt
Enter fullscreen mode Exit fullscreen mode

Look for the hot functions at the top of profile.txt:

 [JavaScript]:
   ticks  total  nonlib   name
   2,847   31.2%   34.1%  crypto.Hash.update
   1,203   13.2%   14.4%  JSON.stringify
     891    9.8%   10.7%  Array.prototype.sort
Enter fullscreen mode Exit fullscreen mode

crypto.Hash.update eating 31% of CPU time for a "fingerprint" that's used for... cache busting? This needs scrutiny.

The Analysis

// The original — called 200 times per request
const fingerprint = crypto
  .createHash('sha256')
  .update(JSON.stringify(order))  // stringify a full order object, 200x
  .digest('hex');
Enter fullscreen mode Exit fullscreen mode

Two problems:

  1. JSON.stringify on a full order object, 200 times per request, under 50 concurrent connections = 10,000 stringifies/second, each allocating a new string.
  2. SHA-256 is cryptographically secure. We don't need that for a cache-busting fingerprint. We need fast and unique, not secure.

If the fingerprint is truly needed, use a cheaper hash and stop serializing the whole object:

// Option A: Hash only the fields that actually affect cache validity
const fingerprint = crypto
  .createHash('md4')              // 3x faster than sha256 for this use
  .update(`${order.id}:${order.updated_at.getTime()}`)
  .digest('hex');

// Option B: If you only need uniqueness, not a hash
// updated_at is already a change signal — use it directly
const fingerprint = `${order.id}-${order.updated_at.getTime().toString(36)}`;
Enter fullscreen mode Exit fullscreen mode

Option B is what you almost certainly actually want. It's a string concat, not a hash. It's unique per order-version. It takes microseconds.

After fix 2:

Requests/sec:  489.1   (+57% on top of fix 1)
Latency avg:   101ms   (-36%)
Latency p99:   229ms   (-43%)
CPU idle:      ~62%    (was ~21%)
Enter fullscreen mode Exit fullscreen mode

Problem 3: Memory Pressure and GC Pauses

Run the Clinic.js heap profiler to see allocation patterns:

clinic heapprofiler -- node src/index.js &
autocannon -c 50 -d 10 http://localhost:3000/api/reports
Enter fullscreen mode Exit fullscreen mode

Clinic will generate an HTML flamegraph. The allocation spike you'll see is from building the enriched array: 200 objects, each with a spread copy of the order, plus items array, plus computed fields. Under 50 concurrent connections, that's up to 10,000 object allocations per second, many of them large.

The V8 GC handles this, but not for free. You'll see GC pauses in the p99 latency as the minor GC sweeps short-lived allocations from the new-space.

The Fix: Return the Postgres Result Directly

The JOIN query already gives us the shape we need. Stop copying:

// Before: building enriched[] with spreads and mutations
const enriched = [];
for (const row of result.rows) {
  enriched.push({ ...row, items: row.items, total: row.total, fingerprint });
}

// After: the DB result IS the response — transform in place minimally
const data = result.rows.map(row => ({
  id:          row.id,
  org_id:      row.org_id,
  created_at:  row.created_at,
  items:       row.items,         // already json_agg'd by Postgres
  total:       parseFloat(row.total),
  fingerprint: `${row.id}-${new Date(row.created_at).getTime().toString(36)}`,
}));
Enter fullscreen mode Exit fullscreen mode

Explicit field selection instead of spreading also avoids accidentally sending internal fields (internal_notes, cost_price, etc.) to the client — a common security issue hiding inside performance code.

After fix 3:

Requests/sec:  541.8   (+11%)
Latency avg:   91ms    (-10%)
Latency p99:   198ms   (-14%)
GC pause max:  4ms     (was 23ms)
Enter fullscreen mode Exit fullscreen mode

The absolute numbers are a modest improvement, but GC max pause dropping from 23ms to 4ms matters — that's what was spiking your p99.


Problem 4: The Event Loop — Blocking JSON Serialization

res.json() calls JSON.stringify() synchronously on the main thread. For small responses this doesn't matter. For a response that's 200 orders × 10 line items each, you're stringifying a 400KB+ object on the event loop, blocking all other requests during that serialization.

Let's prove it with a flame chart:

clinic flame -- node src/index.js &
autocannon -c 50 -d 10 http://localhost:3000/api/reports
Enter fullscreen mode Exit fullscreen mode

You'll see JSON.stringify as a wide horizontal band — it's synchronous time on the main thread. For a 50-concurrent test, this means requests queuing behind each other's serialization.

Fix A: Streaming JSON with fast-json-stringify

npm install fast-json-stringify
Enter fullscreen mode Exit fullscreen mode
const fastJson = require('fast-json-stringify');

// Define the shape of your response once — compile it
const stringify = fastJson({
  type: 'object',
  properties: {
    count: { type: 'integer' },
    data: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          id:          { type: 'integer' },
          org_id:      { type: 'integer' },
          created_at:  { type: 'string' },
          total:       { type: 'number' },
          fingerprint: { type: 'string' },
          items: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                id:    { type: 'integer' },
                price: { type: 'number' },
                qty:   { type: 'integer' },
                sku:   { type: 'string' },
              }
            }
          }
        }
      }
    }
  }
});

// In the route handler
const payload = stringify({ data, count: data.length });
res.setHeader('Content-Type', 'application/json');
res.end(payload);
Enter fullscreen mode Exit fullscreen mode

fast-json-stringify generates a schema-specific serializer at startup — no runtime type-checking, no property iteration. For a known schema it's typically 2–5x faster than JSON.stringify.

Fix B: For Very Large Responses — JSONStream

If your response can be megabytes, don't serialize it all before sending. Stream it:

npm install JSONStream
Enter fullscreen mode Exit fullscreen mode
const JSONStream = require('JSONStream');

router.get('/api/reports', async (req, res) => {
  // ... query ...

  res.setHeader('Content-Type', 'application/json');

  const stream = JSONStream.stringify('{"data":[', ',', ']}');
  stream.pipe(res);

  for (const row of result.rows) {
    stream.write(transformRow(row));
  }

  stream.end();
});
Enter fullscreen mode Exit fullscreen mode

This writes the response incrementally — the client starts receiving bytes before you've processed the last row. Critical for very large datasets.

After fix 4 (fast-json-stringify):

Requests/sec:  618.3   (+14%)
Latency avg:   80ms    (-12%)
Latency p99:   171ms   (-14%)
Enter fullscreen mode Exit fullscreen mode

Problem 5: Connection Pool Starvation

Under sustained 50-connection load, you'll hit a subtler problem: connection pool exhaustion. The default pg Pool size is 10. With 50 concurrent requests each needing a connection, 40 of them are waiting in queue.

// The invisible default that's killing your concurrency
const pool = new Pool({
  // max: 10  ← this is the default you never set
});
Enter fullscreen mode Exit fullscreen mode

Tuning the Pool

const { Pool } = require('pg');

const pool = new Pool({
  host:     process.env.PGHOST,
  database: process.env.PGDATABASE,
  user:     process.env.PGUSER,
  password: process.env.PGPASSWORD,
  port:     5432,

  // Tune these to your Postgres max_connections and node count
  max:             25,    // per Node process; multiply by process count
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,

  // Log pool events in development — essential for diagnosing starvation
  ...(process.env.NODE_ENV === 'development' && {
    log: (...args) => console.log('[pool]', ...args),
  }),
});

// Monitor pool health — expose this to your metrics system
pool.on('connect',  () => metrics.gauge('pg.pool.size', pool.totalCount));
pool.on('acquire',  () => metrics.gauge('pg.pool.waiting', pool.waitingCount));
pool.on('remove',   () => metrics.gauge('pg.pool.idle', pool.idleCount));
Enter fullscreen mode Exit fullscreen mode

The right max value:

max per process = floor(postgres_max_connections / node_process_count) - headroom
Enter fullscreen mode Exit fullscreen mode

If Postgres is configured for 100 connections and you run 4 Node processes:
floor(100 / 4) - 5 = 20 — leave 5 for admin connections, migrations, etc.

After fix 5:

Requests/sec:  791.2   (+28%)
Latency avg:   62ms    (-23%)
Latency p99:   134ms   (-22%)
Enter fullscreen mode Exit fullscreen mode

Connection pool sizing is pure configuration — no code to write, enormous impact.


The Complete Optimized Handler

Here's the final version — everything applied:

// src/routes/reports.js  (the after)
const express    = require('express');
const db         = require('../db');
const fastJson   = require('fast-json-stringify');
const router     = express.Router();

const stringify = fastJson({
  type: 'object',
  properties: {
    count: { type: 'integer' },
    data: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          id:          { type: 'integer' },
          org_id:      { type: 'integer' },
          created_at:  { type: 'string'  },
          total:       { type: 'number'  },
          fingerprint: { type: 'string'  },
          items: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                id:    { type: 'integer' },
                price: { type: 'number'  },
                qty:   { type: 'integer' },
                sku:   { type: 'string'  },
              }
            }
          }
        }
      }
    }
  }
});

router.get('/api/reports', async (req, res) => {
  const { org_id, start, end } = req.query;

  if (!org_id || !start || !end) {
    return res.status(400).json({ error: 'org_id, start, end are required' });
  }

  const result = await db.query(
    `SELECT
       o.id, o.org_id, o.created_at,
       json_agg(
         json_build_object('id', li.id, 'price', li.price, 'qty', li.qty, 'sku', li.sku)
         ORDER BY li.id
       ) AS items,
       SUM(li.price * li.qty) AS total
     FROM orders o
     JOIN line_items li ON li.order_id = o.id
     WHERE o.org_id = $1
       AND o.created_at BETWEEN $2 AND $3
     GROUP BY o.id
     ORDER BY total DESC`,
    [org_id, start, end]
  );

  const data = result.rows.map(row => ({
    id:          row.id,
    org_id:      row.org_id,
    created_at:  row.created_at.toISOString(),
    items:       row.items,
    total:       parseFloat(row.total),
    fingerprint: `${row.id}-${row.created_at.getTime().toString(36)}`,
  }));

  const payload = stringify({ data, count: data.length });
  res.setHeader('Content-Type', 'application/json');
  res.end(payload);
});

module.exports = router;
Enter fullscreen mode Exit fullscreen mode

Full Benchmark Summary

Every fix measured, no cherry-picking:

Fix Req/sec Avg latency p99 latency Delta req/sec
Baseline 47 1,041ms 2,380ms
1. Eliminate N+1 312 158ms 401ms +562%
2. Cheaper fingerprint 489 101ms 229ms +57%
3. Reduce allocations 542 91ms 198ms +11%
4. fast-json-stringify 618 80ms 171ms +14%
5. Pool tuning 791 62ms 134ms +28%
Total 791 62ms 134ms +1,574%

The N+1 was worth 5x on its own. Everything else stacked another 2.5x on top. That's the real distribution of performance work — one structural problem and a handful of incremental improvements.


What to Do When the Low-Hanging Fruit Is Gone

After these fixes, you've addressed the common offenders. Further gains require different tools:

Worker threads for CPU-heavy work. If you have actual computation (image processing, cryptography on large data, PDF generation), offload it:

const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

// main thread
function runInWorker(data) {
  return new Promise((resolve, reject) => {
    const w = new Worker(__filename, { workerData: data });
    w.on('message', resolve);
    w.on('error',   reject);
  });
}

// worker thread
if (!isMainThread) {
  const result = expensiveComputation(workerData);
  parentPort.postMessage(result);
}
Enter fullscreen mode Exit fullscreen mode

Caching at the right layer. If the same org_id + date range is queried repeatedly, cache at the route level — but measure hit rate before adding cache complexity. A cache that misses 80% of the time adds latency, not removes it.

const cache = new Map();          // Replace with Redis in production

router.get('/api/reports', async (req, res) => {
  const key = `${req.query.org_id}:${req.query.start}:${req.query.end}`;
  const cached = cache.get(key);

  if (cached) {
    res.setHeader('X-Cache', 'HIT');
    res.setHeader('Content-Type', 'application/json');
    return res.end(cached);
  }

  // ... query and process ...

  const payload = stringify({ data, count: data.length });
  cache.set(key, payload);
  setTimeout(() => cache.delete(key), 30_000);  // 30s TTL

  res.setHeader('X-Cache', 'MISS');
  res.setHeader('Content-Type', 'application/json');
  res.end(payload);
});
Enter fullscreen mode Exit fullscreen mode

Horizontal scaling. Node is single-threaded per process. Use the cluster module or PM2 to run one process per CPU core. This is orthogonal to the optimizations above — do both.

// cluster.js
const cluster = require('cluster');
const os      = require('os');

if (cluster.isPrimary) {
  const cpus = os.cpus().length;
  console.log(`Forking ${cpus} workers`);
  for (let i = 0; i < cpus; i++) cluster.fork();
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting`);
    cluster.fork();
  });
} else {
  require('./src/index.js');
}
Enter fullscreen mode Exit fullscreen mode

The Discipline

Performance work without measurement is superstition. The discipline is:

  1. Baseline before you touch anything. No exceptions.
  2. Change one thing at a time. If you fix three things together, you don't know which one mattered.
  3. Profile before you optimize. The bottleneck is almost never where you think it is until you look.
  4. Keep all your benchmark JSON files. You'll need to explain the improvement to someone who wasn't there.
  5. Test under realistic concurrency. A benchmark with c 1 will not find pool exhaustion or GC pressure.

The 16x improvement above came from five targeted fixes to one endpoint. That's not unusual. Most production APIs have an N+1 they've lived with for years, a hash function nobody remembers adding, and a connection pool set to its default. Profile, fix, measure, repeat.

Top comments (0)