quietpulse

Posted on May 11 • Originally published at quietpulse.xyz

Rails Scheduled Job Monitoring: How to Catch Missed Jobs Before They Break Production

#rails #ruby #cron #monitoring

Rails scheduled job monitoring is easy to forget because scheduled work usually lives in the background. Your web app is up, requests are fine, the database is responding, and dashboards look green. Meanwhile, a nightly billing sync, cleanup task, email digest, or data import may have stopped running three days ago.

That is the dangerous part: scheduled jobs often fail quietly.

A Rails app can look completely healthy while important recurring work is missing. Users may not notice right away. You may not notice right away. Then suddenly invoices are wrong, trial expirations did not happen, reports are stale, or a queue is full of old data.

This guide covers how Rails scheduled jobs fail, why logs are not enough, and how to use heartbeat monitoring to catch missed executions before they become production incidents.

The problem

Rails apps often rely on scheduled background work for things that are not directly tied to a web request.

Common examples include:

sending daily or weekly email digests
charging subscriptions
syncing data from third-party APIs
expiring trials or temporary records
cleaning old sessions, uploads, or audit logs
generating reports
enqueueing recurring jobs
refreshing cached data
retrying failed external operations

These tasks may be implemented with different tools:

plain cron
whenever
sidekiq-cron
sidekiq-scheduler
good_job
solid_queue
delayed_job
Heroku Scheduler
Kubernetes CronJobs
systemd timers
custom Rake tasks

The implementation changes, but the monitoring problem stays the same.

A scheduled job can fail in several ways:

it never starts
it starts but crashes
it hangs forever
it runs on the wrong schedule
it runs on one environment but not another
it queues work but workers are down
it silently skips important records
it completes locally but fails in production

The most frustrating failure mode is the missing run. Nothing explodes. No exception is raised. No user request fails. The scheduled job simply does not happen.

That is exactly the kind of issue normal Rails monitoring often misses.

Why it happens

Rails scheduled job failures usually come from small operational details rather than dramatic bugs.

One common cause is a broken cron environment. Cron does not load the same shell profile as your interactive terminal. Environment variables may be missing. Ruby, Bundler, or Rails paths may be different. A command that works perfectly over SSH may fail when cron runs it.

For example:

bundle exec rails runner "Billing::SyncJob.perform_now"

might work in your shell, while cron fails because it cannot find bundle, does not have RAILS_ENV=production, or runs from the wrong directory.

Another common issue is deployment drift. A scheduled task may be configured on an old server, a staging box, or a container that no longer exists. After an infrastructure migration, the app is still online, but the scheduler was never recreated.

Queue-backed scheduling adds another layer. With Sidekiq, GoodJob, Solid Queue, or Delayed Job, there are two separate things to monitor:

Did the scheduler enqueue the job?
Did a worker actually execute it?

If the scheduler runs but workers are stopped, jobs pile up. If workers run but the scheduler is broken, nothing gets enqueued. Looking at only one side can give you a false sense of safety.

Rails deployments also make scheduled work easy to accidentally duplicate or disable. You may have multiple app servers, multiple containers, or multiple release directories. If every instance runs the scheduler, the job may execute many times. If none of them run it, the job disappears completely.

There are also application-level causes:

feature flags disable part of the job
a database query becomes too slow and times out
an API token expires
a lock never releases
a migration changes a column the job depends on
the job rescues exceptions too broadly
a retry loop hides the real failure

In all of these cases, the Rails app can still serve web traffic normally.

That is why Rails scheduled job monitoring needs to focus on the scheduled work itself, not just the app process.

Why it's dangerous

Silent scheduled job failures can be expensive because they often affect delayed, accumulated, or business-critical work.

If a cleanup job stops running, the impact may start small. A few old records remain. Disk usage grows a little. Queries become slightly slower. Then, weeks later, storage fills up or a table becomes painfully large.

If a billing job stops running, the damage is more direct. Customers may not be charged, invoices may not be sent, subscription states may drift, or payment retries may never happen.

If a sync job stops running, your app may show stale data. Users may make decisions based on old information. Support tickets appear, but the root cause is not obvious.

If an email digest job stops running, engagement drops quietly. Nobody gets paged. The app is up. But an important product loop is broken.

The same pattern appears across many Rails systems:

failed nightly reports
missed customer notifications
stuck import pipelines
stale search indexes
broken cache refreshes
abandoned trial expiration tasks
missed webhook retry jobs
incomplete analytics rollups

Traditional monitoring often does not catch these failures.

Uptime checks only confirm that an HTTP endpoint responds. Error tracking catches exceptions only if the job raises and reports them. Logs help only if someone searches them or has log-based alerts configured correctly. Queue dashboards show queue state, but not always whether a recurring job was expected and missed.

The dangerous question is not just “did something fail?”

It is also:

Did the job run when it was supposed to?

That is the core question Rails scheduled job monitoring should answer.

How to detect it

The simplest reliable pattern is heartbeat monitoring.

A heartbeat is a small signal sent by your scheduled job when it runs successfully. An external monitor expects that signal on a schedule. If the signal does not arrive within the expected time window, it alerts you.

Instead of only watching for errors, you watch for proof of success.

For example, if a Rails job should run every night at 02:00, the monitor expects one successful ping every 24 hours. If no ping arrives by 02:30, something is wrong:

cron did not run
the scheduler is misconfigured
the Rails command crashed
the job hung before completion
the worker never processed it
the server was down
the deploy broke the task

The monitor does not need to know which failure happened first. It knows the important outcome: the scheduled job did not complete successfully on time.

That is the key advantage.

For Rails apps, a heartbeat should usually be sent at the end of the job, after the important work is complete. This avoids false success signals.

Bad pattern:

class NightlyBillingJob < ApplicationJob
  def perform
    ping_monitor
    Billing::RunNightlySync.call
  end
end

If billing fails after the ping, the monitor still sees success.

Better pattern:

class NightlyBillingJob < ApplicationJob
  def perform
    Billing::RunNightlySync.call
    ping_monitor
  end
end

Now the heartbeat means the job actually reached the end.

For jobs with multiple critical steps, you can ping only after all required steps finish. If the job partially completes and then fails, the missing heartbeat tells you something needs attention.

This is different from logging. Logs describe what happened inside your system. Heartbeats prove that an expected scheduled outcome happened from the outside.

A good Rails scheduled job monitoring setup usually tracks:

expected frequency
grace period
last successful run
missed runs
alert channel
job identity
production environment only

The grace period matters. If a job runs every hour, you may allow 10 or 15 extra minutes before alerting. If a nightly job usually takes 20 minutes, do not alert after 2 minutes. Monitor the real expected completion window.

Simple solution (with example)

Here is a simple Rails example using a heartbeat ping at the end of a scheduled job.

Imagine you have a job that runs every night and syncs subscription states:

# app/jobs/nightly_subscription_sync_job.rb
class NightlySubscriptionSyncJob < ApplicationJob
  queue_as :default

  def perform
    SubscriptionSync.run!
    ping_monitor
  end

  private

  def ping_monitor
    return unless Rails.env.production?

    uri = URI("https://quietpulse.xyz/ping/YOUR_TOKEN")

    Net::HTTP.start(uri.host, uri.port, use_ssl: true, read_timeout: 5) do |http|
      request = Net::HTTP::Get.new(uri)
      http.request(request)
    end
  rescue StandardError => e
    Rails.logger.warn("Heartbeat ping failed: #{e.class}: #{e.message}")
  end
end

The important details:

the ping happens after SubscriptionSync.run!
it only runs in production
it has a short timeout
ping failure is logged but does not break the job
the URL uses a simple success ping endpoint

You can schedule this job with whichever tool your Rails app already uses.

With sidekiq-cron, the schedule might look like this:

nightly_subscription_sync:
  cron: "0 2 * * *"
  class: "NightlySubscriptionSyncJob"
  queue: default

With whenever, you may schedule a Rails runner or Rake task:

# config/schedule.rb
set :environment, "production"

every 1.day, at: "2:00 am" do
  runner "NightlySubscriptionSyncJob.perform_later"
end

With a Rake task:

# lib/tasks/subscriptions.rake
namespace :subscriptions do
  desc "Run nightly subscription sync"
  task nightly_sync: :environment do
    SubscriptionSync.run!

    if Rails.env.production?
      uri = URI("https://quietpulse.xyz/ping/YOUR_TOKEN")
      Net::HTTP.get_response(uri)
    end
  end
end

Then cron could run:

cd /var/www/myapp/current && RAILS_ENV=production bundle exec rake subscriptions:nightly_sync

A more defensive shell version can make sure the heartbeat only fires after the Rails task succeeds:

cd /var/www/myapp/current &&
RAILS_ENV=production bundle exec rake subscriptions:nightly_sync &&
curl -fsS --max-time 10 https://quietpulse.xyz/ping/YOUR_TOKEN

The && matters. It means the ping only runs if the previous command exits successfully.

If you use a heartbeat monitoring tool like QuietPulse, you create a check with the expected interval, add the generated ping URL to the end of your job, and receive an alert if the job misses its window. You can build something similar yourself, but using a small external monitor is usually simpler and more reliable than having the app monitor its own missing work.

The main idea is not tool-specific: every important scheduled job should produce an external success signal.

Common mistakes

1. Pinging at the start of the job

This is the most common mistake.

If you ping at the start, you only prove that the job began. You do not prove that it completed.

For short, simple jobs, that may feel good enough. But for billing, syncs, reports, imports, and cleanup tasks, completion matters much more than startup.

Ping after the critical work finishes.

2. Monitoring only the queue

Queue dashboards are useful, but they are not the same as scheduled job monitoring.

A queue may look healthy while a recurring job is never enqueued. Or the scheduler may enqueue the job successfully while workers are stuck. You need to monitor the expected completion of the scheduled task, not just the presence of a worker process.

3. Using one heartbeat for many jobs

One generic “daily jobs ran” heartbeat is tempting, but it hides which job failed.

If you have separate billing, cleanup, report, and sync jobs, give important jobs their own checks. That way, the alert tells you exactly what is missing.

4. Ignoring time zones

Rails, cron, Sidekiq, Kubernetes, and hosting platforms may use different time zones.

A job scheduled for “2 AM” may not mean what you think it means. Daylight saving time can also surprise you.

Use UTC where possible, document the expected schedule, and set heartbeat grace periods based on real execution times.

5. Swallowing exceptions too broadly

Some Rails jobs rescue everything to avoid retry storms:

rescue StandardError
  nil
end

That pattern can hide real failures. If the job also sends a heartbeat after the rescue, monitoring becomes misleading.

Log exceptions clearly, report them to error tracking, and only send the heartbeat after the required work actually succeeded.

Alternative approaches

Heartbeat monitoring is the most direct way to detect missed scheduled jobs, but it works best alongside other signals.

Logs are still useful. Rails logs can show job start times, durations, record counts, API failures, and SQL issues. Structured logs make debugging much easier after an alert fires.

Error tracking is also important. Tools like Sentry, Honeybadger, Rollbar, or AppSignal can catch exceptions inside jobs. They answer a different question: “Did the job crash with an error?” Heartbeats answer: “Did the job complete on time?”

Queue monitoring helps too. For Sidekiq, GoodJob, Solid Queue, or Delayed Job, you should watch queue latency, retries, dead jobs, and worker availability. If a scheduled job misses its heartbeat, queue metrics often help explain why.

Database checks can catch business-level symptoms. For example:

no invoices created in 24 hours
no imports completed today
no reports generated this week
no webhook retries processed recently

These checks are powerful, but they are usually more custom. A heartbeat is easier to add first.

Uptime checks are useful for the Rails web app itself, but they are not enough for scheduled work. Your homepage or health endpoint can return 200 OK while every recurring job is broken.

The best setup is layered:

uptime monitoring for the web app
error tracking for exceptions
queue monitoring for background workers
logs for debugging
heartbeat monitoring for scheduled job completion
business checks for critical outcomes

Each signal catches a different class of failure.

FAQ

What is Rails scheduled job monitoring?

Rails scheduled job monitoring means tracking whether recurring Rails tasks run successfully on their expected schedule. These tasks may be cron jobs, Rake tasks, Active Job jobs, Sidekiq jobs, GoodJob jobs, or scheduler-triggered background work.

The goal is to detect missed, failed, delayed, or silently broken jobs before they cause production problems.

How do I monitor Rails cron jobs?

The simplest approach is to send a heartbeat ping at the end of each important cron job. An external monitor expects that ping based on the job schedule and alerts you if it does not arrive.

For example, if a Rails Rake task runs every night, add a success ping after the task completes. If cron fails, Rails crashes, or the job hangs, the ping will be missing.

Is Sidekiq monitoring enough for scheduled jobs?

Sidekiq monitoring is useful, but it is not always enough. It can show retries, dead jobs, queue latency, and worker status. But scheduled job monitoring should also confirm that each expected recurring job completed on time.

A Sidekiq dashboard may not alert you when a scheduler stops enqueueing a job entirely. Heartbeat monitoring closes that gap.

Should I ping before or after a Rails job runs?

Usually after.

A heartbeat should represent successful completion, not just startup. If you ping before the job runs and then the job fails halfway through, your monitor will show a false success.

Ping only after the critical work finishes.

What Rails jobs should have heartbeat monitoring?

Start with jobs where a missed run would hurt users, revenue, data quality, or operations.

Good candidates include billing syncs, subscription updates, imports, exports, email digests, cleanup tasks, report generation, webhook retries, search indexing, and analytics rollups.

Not every tiny maintenance task needs its own alert, but important scheduled jobs should be visible.

Conclusion

Rails scheduled job monitoring is about proving that important background work actually happened.

Your Rails app can be online while scheduled jobs are broken. Cron can miss runs. Schedulers can stop. Workers can fail. Environment variables can disappear. Jobs can hang or silently skip work.

Logs, error tracking, and queue dashboards all help, but they do not fully answer the most important question:

Did this scheduled job complete when expected?

Heartbeat monitoring gives you that answer. Add a success ping at the end of each critical Rails scheduled job, set the expected interval, and alert when the signal goes missing.

That small pattern can save you from discovering a broken billing sync, stale report, or missing cleanup task days too late.

Originally published at https://quietpulse.xyz/blog/rails-scheduled-job-monitoring

DEV Community