Your nightly billing sync ran at 2am. Sidekiq shows it completed. No exceptions, no retries, no dead queue entries. Your app looks healthy.
It processed zero invoices.
It's been doing this for eleven days.
This happens more than people admit. Sidekiq is excellent at handling failed jobs — its retry mechanism and dead queue are genuinely well designed. But "failed" in Sidekiq means "raised an exception." A job that connects to the database, queries 0 rows, and exits cleanly isn't a failed job. It's a successful job that did nothing. Sidekiq has no opinion on the difference.
This article covers how to close that gap.
Why Sidekiq's built-in monitoring isn't enough for scheduled jobs
Sidekiq ships with a web UI that shows queue depths, processed counts, failed jobs, and scheduled jobs. For a queue-based system, this is useful. But for scheduled jobs — the kind you run with sidekiq-cron or sidekiq-scheduler — you need something different.
The questions that matter for scheduled jobs are:
- Did it run on schedule? (Not just "has it ever run?")
- Did it actually process anything?
- Is it taking longer than usual?
Sidekiq's web UI answers none of these. It shows you the last enqueued time and whether the job class exists in the schedule. That's not the same as knowing whether it ran at 2am last Tuesday, and whether it exported 1,400 rows like it should have.
The dead man's switch pattern
The fix is to invert the monitoring model. Instead of your monitoring system polling Sidekiq to check if jobs ran, you make your jobs proactively check in with an external service. If the external service stops receiving check-ins, it alerts you.
This is called a dead man's switch (or heartbeat monitoring). The idea: if the job dies or goes silent, the external service notices — because it's looking for a regular ping that never came.
Here's the three-signal implementation: start, success, fail.
# app/workers/daily_export_worker.rb
require 'net/http'
require 'json'
class DailyExportWorker
include Sidekiq::Job
TOKEN = ENV['DEADMANCHECK_TOKEN']
BASE = "https://deadmancheck.io/ping/#{TOKEN}"
def perform
dmc_start # begins duration timer
rows = run_export
dmc_success(rows) # signals completion + row count
rescue
dmc_fail
raise # re-raise so Sidekiq handles retries normally
end
private
def dmc_start
Net::HTTP.get(URI("#{BASE}/start"))
rescue; end
def dmc_success(count)
uri = URI(BASE)
req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
req.body = { count: count }.to_json
Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |h| h.request(req) }
rescue; end
def dmc_fail
Net::HTTP.get(URI("#{BASE}/fail"))
rescue; end
end
A few things worth noting:
- Each ping helper rescues all exceptions silently. A monitoring outage should never kill a production job — the monitoring is less important than the job.
- The
raiseafterdmc_failis intentional. Let Sidekiq handle its own retry logic; don't swallow the error just because you've notified the external service. - Uses Ruby's stdlib
Net::HTTP— no extra gem to add to your Gemfile.
Works the same with sidekiq-cron or sidekiq-scheduler
If you're using sidekiq-cron or sidekiq-scheduler to run workers on a cron schedule, the perform method is already the right integration point. Your schedule config stays the same:
# config/schedule.yml (sidekiq-scheduler)
daily_export:
cron: "0 2 * * *"
class: DailyExportWorker
queue: default
Create one monitor per scheduled job and set its interval to your schedule length plus a buffer. For a daily job: 25 hours. For an hourly job: 70 minutes. The buffer prevents false alerts from minor timing drift.
Output assertions: the part most tutorials skip
Here's the thing about "job ran successfully": Sidekiq marks a job successful when it completes without an exception. That tells you about the job's execution. It tells you nothing about whether the job's output was valid.
If your export job queries a table that returns 0 rows (because an upstream pipeline broke two days ago), Sidekiq marks it done. Your success rate metrics stay green. You find out eleven days later when someone asks why their data is stale.
DeadManCheck lets you configure an output assertion: alert if the count in the ping is below a threshold. You set it to count > 0. Now a job that exports zero rows triggers an alert, even though Sidekiq considers it a success.
This is done through the POST body:
# In dmc_success, POST the row count
req.body = { count: rows_exported }.to_json
Then in the monitor settings, configure: "alert if count is 0 or less."
The other cron monitoring tools — Cronitor, Healthchecks.io, Better Stack — check whether the ping arrived. They don't check what the ping reported. Output assertions are the difference between knowing your job ran and knowing your job worked.
Duration monitoring
The start ping does double duty: it starts a duration timer. When the success ping arrives, DeadManCheck records the elapsed time.
After 5 or more runs, it builds a rolling average. If a run takes significantly longer than the baseline — say, your 30-second export starts taking 8 minutes — it flags the anomaly.
This is a useful leading indicator. A slow job often means:
- A query that's hitting an un-indexed table after a data volume threshold was crossed
- A downstream API starting to time out
- A Redis or database connection pool under pressure
You find out before users notice latency in the actual product.
The full setup takes about 10 minutes
- Create a free account — no credit card needed, free for 5 monitors
- Add a new monitor, set the interval to match your schedule + buffer
- Copy the token into your environment as
DEADMANCHECK_TOKEN - Add the three helper methods to your worker (or a shared concern)
- Set the output assertion threshold if your job processes records
- Deploy, trigger the job manually once, confirm the ping arrives in the dashboard
After that, you'll get an alert if:
- The job doesn't run on schedule (missed ping)
- The job raises an exception (fail ping)
- The job runs but processes nothing (output assertion)
- The job takes significantly longer than usual (duration anomaly)
That's the full set of failure modes — including the silent ones that Sidekiq alone won't catch.
DeadManCheck is open source and self-hostable. If you'd rather run the monitoring infrastructure yourself: GitHub →
Top comments (0)