Teaching an AI Agent to Talk to a Light Bulb (and Why I Had to Get Up From My Desk)

#agents #ai #iot #showdev

Teaching an AI Agent to Talk to a Light Bulb (and Why I Had to Get Up From My Desk)

I just shipped a PR that made a pile of stock-firmware Sengled W12-N15 bulbs reliably re-bind to a self-hosted MQTT broker for Home Assistant with no firmware flashing, no soldering, no physical mods. The fix itself is small (one file, ~120 lines of Python in SengledTools). The interesting part is how it got debugged: a back-and-forth between a coding agent and a human who kept having to leave the chair to walk over and power-cycle a light bulb.

This post is part technical write-up, part field notes on what it's actually like to pair-program with an AI on hardware that doesn't live inside the computer.

The technical problem in one paragraph

Sengled's W15-N15 series ESP8266-based bulbs run the AWS IoT C SDK on top of mbedtls. After Sengled's cloud went dark, the community-maintained SengledTools project stood up a local MQTT broker (amqtt) and HTTP endpoint server so the bulb thinks it's still talking to the mothership. It mostly worked, except for two recurring bugs (issues #60 and #62): pairing would stall at "waiting for bulb to verify setup endpoints," and even when it succeeded, the bulb wouldn't persist the Wi-Fi credentials across a power cycle. Bulb reboots → bulb is factory-reset.

The root causes turned out to be three TLS issues stacked on top of each other:

Missing ALPN. AWS IoT clients advertise x-amzn-mqtt-ca as the ALPN protocol on TCP/443. The embedded broker wasn't offering it, so mbedtls aborted the handshake silently before any MQTT bytes flew.
Cipher / version mismatch. ESP8266 mbedtls only does TLS 1.2 and a narrow cipher set (ECDHE-RSA-AES128/256-GCM-SHA256/384, etc.). Modern OpenSSL's default SECLEVEL=2 rejects most of them outright.
Stale cert SAN. Once you regenerate certs, the SAN bakes in your LAN IP. If the host machine's IP changes (laptop on Wi-Fi today, desktop on Ethernet tomorrow), the existing server cert no longer covers the IP the bulb is trying to reach, and generate_certificates() was skipping regeneration because the files existed.

The fix forces TLS 1.2, sets DEFAULT:@SECLEVEL=0 plus the explicit AWS IoT cipher list, sets ALPN, and auto-reissues the server cert (preserving the CA so already-paired bulbs still trust the broker) whenever the current LAN IP isn't in the SAN.

That's the whole PR. The interesting story is everything that happened to find it.

Why an AI agent can't solve this alone

A typical "AI helps you fix code" loop looks like this:

Read code → propose change → run tests → read failures → iterate.

That loop assumes the system under test is software. For an IoT debugging session, the loop is:

Read code → propose change → ask human to walk to a closet, find a tiny lamp, screw it into a socket, flip the switch on/off five times in a precise rhythm to factory-reset it, walk back, reconnect their laptop's Wi-Fi to a transient AP whose SSID changes per bulb, hope the AP doesn't drop, run the script, watch packets, iterate.

Every iteration has a meatspace component the agent can't perform. The agent can read TLS captures and propose cipher strings all day, but when the bulb's AP refuses to issue a DHCP lease for the third time in a row, the only fix is for me to physically pull the bulb out of the lamp, count to ten, and screw it back in. There's no API for that.

This created an interesting dynamic. The agent (Claude, in VS Code, with shell + file-edit tools) was great at:

Reading firmware strings dumped from a .bin and recognizing AWS IoT SDK signatures.
Proposing TLS context configurations and explaining the tradeoffs.
Diffing logs and spotting when an accessCloud.json POST never fired.
Making the actual code changes and keeping the PR scope tight.

But it was bad at (actually, just incapable of) the parts that mattered most:

Knowing whether the bulb in the next room was actually flashing in the "ready to pair" pattern or the "I gave up" pattern.
Telling whether the Wi-Fi adapter had genuinely associated with the bulb's AP or was lying about it (Windows does this, a lot).
Noticing that the bulb in question is physically the one I think it is. (When you've got four bulbs labeled by the last four hex of their MAC, it's surprisingly easy to grab the wrong one.)

So we found a rhythm. The agent would propose, I would execute the physical bits, then dump the resulting log file back into the chat. Roughly 80% of the wall-clock time was me walking around with a screwdriver and a tiny LED bulb, not the agent thinking.

The dual-NIC trick

Here's the practical thing I want more people to know about, because it changed this project from "frustrating" to "actually tractable":

Your computer can be on two networks at the same time.

I know, obvious in hindsight. But the default Windows experience strongly suggests you have one network, the one Wi-Fi is currently connected to, and that's it.

The pairing flow for these bulbs requires you to:

Connect to the bulb's transient AP (Sengled_Wi-Fi Bulb_XXXX, which gives you a 192.168.8.x lease).
Push the home Wi-Fi credentials over UDP.
Re-connect to your home network so the bulb can find your local MQTT broker on it.
Verify the bulb shows up.

If you only have one NIC, step 3 disconnects you from the bulb mid-conversation, which means anything that needs to talk to both the bulb's onboard AP and your home LAN simultaneously is impossible plus you lose access to your cloud-hosted coding agent!

The fix: plug an Ethernet cable into your home network and leave Wi-Fi free to roam.

Ethernet stays pinned to 192.168.1.x — your MQTT broker, your Home Assistant, your DNS hijack, all reachable continuously.
Wi-Fi is free to associate with each bulb's AP in turn, push credentials, then drop.
Windows' routing table figures out the rest. The bulb can reach the broker on the Ethernet-side IP; you can reach the bulb on the Wi-Fi-side IP. Both at once.

This also unlocked another sub-trick: I could pre-generate netsh wlan add profile XML for each bulb's AP SSID before the bulb was even powered on, and the orchestration script would netsh wlan connect to each one in round-robin. So pairing four bulbs back-to-back became a single command that hopped Wi-Fi networks while the broker stayed pinned on Ethernet.

If you're doing IoT work on a laptop with no Ethernet jack: a $15 USB-C dongle solves it. Worth it.

What an "agent-friendly" hardware project looks like

Things that made this collaboration work, in roughly the order I'd recommend:

Fast, scriptable physical reset. The Sengled reset dance is "power-cycle five times in two-second intervals." That's annoying but deterministic, which means after a few iterations I had it muscle-memorized and could do a full reset in 15 seconds. Devices that require holding a tiny button with a paperclip while typing a 16-character pairing code on a phone are an order of magnitude harder to debug with an agent.
Verbose firmware logs that survive a dump. A previous contributor had pulled the W31-115 stock firmware binary and run strings on it. That artifact (sitting in the repo as firmware/w31-115-stock_firmware_strings.txt) was gold. It let the agent confirm which AWS IoT SDK was baked in, which cert validation paths existed, and which error strings to grep for in the (silent) failure modes.
Logs from both sides. Anything you can capture (Wireshark on the broker host, Python ssl debug callbacks, bulb LED behavior) is an extra signal. The thing the agent kept asking for and I kept forgetting to provide was the bulb's blink pattern. It turns out "rapid flash for 60 seconds then steady" means something very different from "slow flash forever," and only one of those two states means "I successfully bound, please reboot me." The agent can't see that. You have to type it.
A cheap "do nothing dangerous" mode. Early on we set up a flag that ran the broker without the firmware-flashing pieces. Meant we could iterate on TLS for hours without ever risking a brick. The fact that the user's stated goal was "make HA happy without flashing" made this the default.

Things that made it hard:

Windows network stack quirks. Wi-Fi profile management via netsh has at least three different output formats depending on locale and Windows version, which broke regex parsing twice. PowerShell strict mode is unforgiving when one of those parses returns $null. The agent was the right tool to hammer those out, but it took a couple of cycles per quirk.
Asymmetric information. I knew things the agent didn't (which bulb was which, which physical reset had actually worked, what the LED was doing right now). The agent knew things I didn't (what every byte of the cipher string actually meant, what amqtt's internal state machine looked like). Most of the wall-clock time was spent translating between those two pools.
Tool prompts that hang on interactive input. A non-trivial chunk of session ended with an agent terminal stuck waiting for me to press Enter on a gh auth login prompt while the agent waited for the terminal to produce output. Async tooling around long-lived interactive shells is still rough.

The unglamorous outcome

The PR is one file. Three TLS knobs and a SAN check. If you'd shown me that diff up front and asked "could this take a week of evenings to find?" I would have laughed. But the diff is not the work. The work is:

Knowing that the diff is the right diff.
Having the confidence that the system actually behaves as expected after the change, on real hardware, across reboots and IP changes and accidentally-shorted contacts and dead bulbs you didn't know were dead.

The agent and I split that work in a way that I think is roughly the long-term shape of this kind of collaboration: the agent owns symbolic reasoning and code mechanics; the human owns the meatspace ground truth and the long-tail of "is this thing actually doing what we think."

For now, that means I still keep a USB-C Ethernet adapter, four light bulbs in various states of reset, and a screwdriver on my desk. The agent has not yet figured out how to use any of those.

The PR: HamzaETTH/SengledTools#63. Fixes #60, #62.