I found out one of my background jobs had stopped running only after the data looked wrong the next day.
There was no dramatic crash. No big incident. The job just quietly failed, and I only noticed because something downstream looked stale.
That is the annoying part about cron jobs and scheduled scripts. Most of the time they run in the background, write some logs, and nobody thinks about them until something is missing.
I have a few jobs like this:
- data updates
- cleanup scripts
- small imports
- external API calls
- recurring background tasks
None of them are very exciting. But when one of them does not run, or starts and never finishes, it can create a surprisingly annoying problem.
That is the kind of failure I wanted to make more visible.
I also built a small V1 of this idea here:
There is also a self-hosted version for people who prefer to run this kind of monitoring on their own infrastructure:
missedrun
/
missedrun-selfhosted
Self-hosted cron and scheduled job monitoring for detecting silent failures.
MissedRun Self-hosted
Self-hosted cron and scheduled job monitoring for detecting silent failures.
Hosted version: https://missedrun.com
Self-hosted version: https://github.com/missedrun/missedrun-selfhosted
MissedRun monitors recurring jobs such as cron scripts, backups, imports, ETL pipelines, billing syncs, cleanup tasks, and scheduled reports.
It works by giving each monitor a unique ping URL. Your job calls that URL when it runs. If the job does not check in within the expected interval plus grace period, MissedRun marks it as missing and can send an alert.
Why MissedRun?
Some production failures are not loud.
A job can stop running without throwing an exception:
- cron did not run
- a server was down
- a Docker container stopped
- credentials expired
- a backup script never started
- an ETL job stopped updating data
- a scheduled report was not generated
- a background worker silently stopped
MissedRun is built to detect this kind of silent failure.
Features
- Create monitors for scheduled jobs
- Generate unique…
This is not a big launch. I am mostly trying to understand if this is a real enough problem for other developers who run cron jobs, ETL jobs, backups, imports, cleanup scripts, or other scheduled tasks.
The problem
Cron jobs are easy to forget about.
They usually do not have a UI. They run somewhere on a server, maybe write logs, and then disappear into the background.
A job can fail because:
- an API token expired
- an environment variable is missing
- a database connection failed
- the server restarted
- the script crashed
- the job started but never finished
- the cron entry was changed or removed
Logs are useful, but only if you go and check them.
In practice, I usually only check logs after I already suspect something is broken.
For recurring jobs, I often want a much simpler answer:
- did it start?
- did it finish?
- did it fail?
- did it miss the expected time?
The ping approach
One simple way to monitor this is to make the job report its own status.
The basic pattern is:
- send a start ping when the job begins
- send a success ping when it finishes
- send a failure ping if it crashes
- mark it as late or missed if the expected ping does not arrive
It is not a complicated idea, but I have found it very useful in practice.
Instead of checking logs manually, the job tells you whether it is still alive.
For example:
- if the start ping arrives, the job is running
- if the success ping arrives, the job finished
- if the fail ping arrives, the job crashed
- if nothing arrives when expected, the job is late or missed
That last case is the important one for me.
A lot of failures are not loud. The job does not always send an error. Sometimes it just does not run.
Bash example
Here is a simple shell wrapper.
This uses placeholder URLs. In a real setup, these would be the ping URLs generated by your monitoring tool.
bash
#!/bin/bash
START_URL="https://example.com/ping/YOUR_TOKEN/start"
SUCCESS_URL="https://example.com/ping/YOUR_TOKEN"
FAIL_URL="https://example.com/ping/YOUR_TOKEN/fail"
curl -fsS -X POST --max-time 5 "$START_URL" >/dev/null || true
your-real-command-here
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
curl -fsS -X POST --max-time 5 "$SUCCESS_URL" >/dev/null || true
else
curl -fsS -X POST --max-time 5 "$FAIL_URL" >/dev/null || true
fi
exit $EXIT_CODE
Top comments (1)
Curious how others think about this:
What’s worse in your setup — a cron job that fails loudly, or one that never runs at all?
For me, the missed case is usually worse because there’s no visible crash. I just notice later that some data is stale.