DEV Community

Fatih İlhan
Fatih İlhan

Posted on

I built a personal job alert pipeline for the UN. Here's the stack and what I learned.

I work at UN. UN job hunting is a particular kind of suffering: openings are scattered across careers.un.org, reliefweb.int, agency-specific portals, and aggregators of varying quality. None of them know my profile. None let me say "only show me P-2 and P-3 protection roles in family duty stations outside Türkiye." So I built one.

This is the story of how I built that — and what the build itself taught me about respecting the tools you scrape from.

The bad idea I started with

My first instinct was simple. Open unjobs.org, it lists everything in one place, write a requests + BeautifulSoup scraper, done. Maybe a Cloudflare Worker, maybe a small Next.js dashboard, ship by Sunday.

I asked Claude to help me scope it. The first thing it did was fetch the site to see the structure.

ROBOTS_DISALLOWED: URL is disallowed by robots.txt rules
Enter fullscreen mode Exit fullscreen mode

Right. unjobs.org explicitly forbids scraping. And the more I read into it, the worse the idea got:

  • The site is itself an aggregator — it scrapes from official sources. So I'd be scraping a scraper, two layers downstream from truth.
  • Their Terms of Service prohibit automated access. Ignoring that is not just rude, it's legally risky in some jurisdictions (HiQ vs LinkedIn territory).
  • I work at UNHCR. Scraping the ecosystem I work in, against ToS, is not a career-positive move.

So I asked the obvious follow-up: where does unjobs.org get its data?

The answer: from the same sources I could go to directly. ReliefWeb has a free, documented public API. careers.un.org has an RSS feed. The "hard problem" of aggregating UN jobs was already solved upstream — I just had to be willing to put in more wiring than pip install beautifulsoup4.

This is the first lesson and I want to flag it loudly: before you scrape, check whether the data has a front door. Aggregators exist because real APIs are obscure or undocumented, not because they don't exist. Five minutes of looking saved me a brittle pipeline, legal exposure, and a worse data quality outcome.

The actual architecture

ReliefWeb API (JSON)           careers.un.org (RSS + JSON)
        │                                │
        ▼                                ▼
  sources/reliefweb.py           sources/careers_un.py
        │                                │
        └──────────► Job (normalized) ◄──┘
                       │
                       ▼
                  scraper.py
                  - dedupe (title + source)
                  - score 0–100 (keyword + grade + location)
                  - filter by MIN_SCORE
                       │
                       ▼
                   store.py        ← SQLite, "have I already emailed this?"
                       │
                       ▼
                  notifier.py      ← Resend, HTML email
                       │
                       ▼
               systemd timer       ← Mon + Thu, 09:00 Istanbul time
Enter fullscreen mode Exit fullscreen mode

Adapter pattern: each source normalizes to one Job dataclass. Adding a new source (untalent.org, UNDP, ICRC) is a 50-line file in sources/. Everything downstream — scoring, dedup, store, notifier — never changes.

Stack:

  • Python 3.12, requests, xml.etree, sqlite3. No frameworks. The whole thing is ~600 lines.
  • Hetzner VPS (€4/mo Ubuntu box I already had)
  • systemd timer instead of cron — better logs, easier debugging, Persistent=true re-runs if the box was off at scheduled time
  • Resend for transactional email (already had a domain via Nokta Studio)
  • SQLite as the "seen" store — a single file, no daemon, no migration framework

No Docker. No Kubernetes. No cloud functions. A venv, a .env, two systemd units. The whole thing deploys with one shell script.

Discovering the API that wasn't documented

ReliefWeb has a great public API. The careers.un.org side was less obvious.

The RSS feed is fine for discovery — title, level, duty station, deadline — but the body of each posting isn't there. So a "Programme Management Officer" at UNCTAD reads identically to one at UNHCR until you read the full description. With only the RSS metadata, scoring was guessing.

I knew the site had to have an internal API somewhere because it's an Angular SPA — view-source: returns an empty shell, then JavaScript fetches and renders content. So the data is being pulled from something.

Browser DevTools, Network tab, filter to Fetch/XHR, reload the page. After clicking through a few requests, here it is:

GET https://careers.un.org/api/public/opening/joV2/<jobId>/en
Enter fullscreen mode Exit fullscreen mode

Returns a clean JSON with jobDescription, jobLevel, dutyStation, jobFamily, the whole structured record. No auth required. The aggregators I was about to scrape? They almost certainly use exactly this endpoint.

This kept being the pattern. Every "we don't have an API" turned out to be "we don't have a documented API, but our frontend reads JSON from somewhere and that somewhere is reachable." DevTools is, frankly, the most underrated tool in a builder's stack.

A small caveat: when you find an endpoint this way, you should still ask whether you're respecting reasonable use. I added 400ms between requests, a custom User-Agent identifying the project, and capped the run at 200 details/run. The whole pipeline makes about 400 requests per week. That's negligible.

The scoring engine, and why it matters more than the scraping

The scraper is the boring part. The interesting part is: given 400 fetched jobs, which 5–10 should land in my inbox?

My answer was a weighted keyword scorer that knows my CV:

CORE_KEYWORDS = {
    "refugee", "asylum", "migration", "protection", "unhcr",
    "resettlement", "statelessness", "psea",
    # Fraud/Integrity — my current role
    "fraud", "integrity", "investigation", "misconduct",
    # Cash-Based Interventions — trained
    "cbi", "cash-based", "cash assistance",
    # ...
}

SOFT_KEYWORDS = {"humanitarian", "human rights"}  # weak signals

NEGATIVE_KEYWORDS = {
    "intern", "internship", "volunteer", "driver",
    # IT roles — wrong profile
    "information systems officer", "cloud engineer", "service desk",
    # Senior leadership — out of grade band
    "director,", "deputy director", "chief of mission",
    # Wrong functions
    "legal officer", "auditor", "procurement officer",
    "counter-terrorism", "finance and budget", "economic affairs",
}

FAMILY_DUTY_STATIONS = {  # whitelist
    "geneva", "vienna", "rome", "copenhagen", "new york",
    "bangkok", "amman", "nairobi", "panama city", # ...
}

NON_FAMILY_STATIONS = {  # hardship — hard reject
    "kabul", "mogadishu", "juba", "kyiv", "damascus", # ...
}
Enter fullscreen mode Exit fullscreen mode

Then score in layers:

  • +30 Türkiye-based (or, in my case, −reject because I want yurtdışı)
  • +20 Family duty station
  • +10 per core keyword (capped at +30)
  • +5 soft keyword (only if no core hit, to avoid double-counting "humanitarian")
  • +5 per domain keyword (capped)
  • +25 if P-2 ("next-step fit"), +15 if P-3 ("stretch")
  • +10 priority agency (UNHCR/IOM/OCHA/OHCHR)
  • Hard reject if grade is outside P-2/P-3, or if any negative keyword hits

The first version was too generous. "humanitarian" matched almost every UN posting and pumped them all to 30+. The fix: demote it to soft, only count if no core keyword fired.

The second version was bug-ridden. A regex \bintern\b on the title INTERNATIONAL CONSULTANT triggered the intern filter because I wasn't using whole-word boundaries correctly. A few false-positive iterations later, scoring stabilized. Watching real output and iterating is the only way to get this right — you can't reason your way to good filters from a CV alone.

The thing I'm most happy about: the system handles its own ignorance gracefully. When body content is sparse (RSS-only fallback), scores are lower. When duty station isn't in either whitelist or blacklist, it gets a 📍 unverified tag with a slight penalty, so it surfaces but isn't claimed to be a perfect match. Calibrated uncertainty.

The dry-run loop

The single most useful thing I added was a dry-run mode:

DRY_RUN=1 MIN_SCORE=0 python -m main
Enter fullscreen mode Exit fullscreen mode

This fetches everything, scores everything, prints two tables — top matches and top near-misses — but doesn't write to the DB and doesn't send email. Tuning loop becomes:

  1. Run dry-run
  2. See which obviously-wrong job got 60 points
  3. Read the reasons it gave that score
  4. Add/remove a keyword
  5. Repeat

I went through maybe 15 iterations of this in an afternoon. The "reasons" output, where each scored job lists the signals that built up its score, was crucial. Without it I'd have been blindly editing keyword sets.

[ 70] HUMAN RIGHTS OFFICER / OPEN-SOURCE INVESTIGATOR (TJO), P3
      📍 OHCHR / Nairobi
      🏡 Family duty station: Nairobi
      ✅ Profile match: integrity, investigation
      📊 Data/M&E/IM: reporting
      🎯 P3 (stretch but doable)
      🤝 Priority agency: OHCHR
      📰 via UN Careers
Enter fullscreen mode Exit fullscreen mode

When a score is interpretable, the calibration loop is fast. When it's a black box, you're guessing.

The boring stuff that mattered

A few choices that felt over-engineered at the time but I'm glad I made:

Adapter pattern. Each source is a module exposing fetch() -> list[Job]. Two sources today, three or four next month. Adding a source doesn't touch scoring, store, notifier, or the main loop. The Job dataclass is the contract.

SQLite for "seen" state. I considered keeping last-run state in a JSON file. SQLite is one extra import sqlite3 and gets me indexed lookups, transactions, and longevity. Total overhead: maybe 30 lines.

Separate jobalert system user. The service doesn't run as root. The .env (with API key) is chmod 600. The unit has NoNewPrivileges, ProtectSystem=strict, PrivateTmp. If something gets compromised, the blast radius is tiny.

Boilerplate stripping. UN job descriptions all end with the same ~2000 words of legal boilerplate ("No Fee", "United Nations Considerations", "Special Notice"). Without stripping this, scoring would pick up keywords from boilerplate that have nothing to do with the actual role. One regex cut, big quality improvement.

Idempotent install script. A single bash deploy/install.sh does everything: user creation, code sync, venv setup, dependency install, .env templating, systemd registration. Safe to re-run. The first time I had to redeploy to debug something at midnight, this saved my sanity.

What's next

The system is running. It mailed me 7 P-3 roles on the first production run, all genuinely worth a look. Twice a week from now on.

But there's a clear progression:

  1. ReliefWeb appname is pending approval. When it lands, the agency mix changes drastically — UNHCR, IOM, NGOs flood in. Scoring will need recalibration.
  2. LLM-based scoring. Right now keyword matching is the engine. A small Claude API call per job — "given this CV and this job description, output a 0-100 fit score and a one-line reasoning" — would be dramatically more accurate. About $1/month at my scale. The keyword scorer becomes a pre-filter.
  3. A web dashboard. Next.js + the SQLite database, so I can see what got filtered and why, mark jobs as "applied," track outcomes over time.
  4. More sources. untalent.org has a JSON API. UNDP has its own. ICRC, MSF, and others have feeds I haven't touched yet.

Takeaways

If you're building a personal automation pipeline, three things to actually internalize:

Front doors first, side doors second. Before you reach for a scraper, look for an API, an RSS feed, a JSON endpoint behind the Angular SPA. The 30 minutes you spend looking is repaid 100x in maintenance.

Boring stack, weird scoring. The infrastructure should be unremarkable — Python, SQLite, systemd, one VPS. The interesting work is the domain modeling: "what does a good match look like for me?" That's where the value compounds, not in your choice of framework.

Dry-run mode is non-negotiable. Any system that filters or scores or notifies needs a way to see what would happen without committing. The tuning loop dies without it.

The code is ~600 lines. The build took a weekend with Claude as a pairing partner — I'd built the core in an evening, then spent a longer second session calibrating the scorer against real data. Going to open-source it once I've added LLM scoring and a dashboard; will link from @fatihbuilds when it's up.

If you're in the humanitarian sector and want this, message me. If you're not but want to adapt the pattern for your own job search — the keyword-based scoring layer ports to any domain. Same code, different CORE_KEYWORDS.

Top comments (0)