DEV Community

stderr
stderr

Posted on • Originally published at blog.ardis.dev

I accidentally made the fastest event system in the world

Liquid syntax error: Variable '{{% raw %}' was not properly terminated with regexp: /\}\}/

Top comments (1)

Collapse
 
motedb profile image
mote

The trampoline pattern is one of those things that's obvious in hindsight but painful to discover. Nice writeup.

One thing that caught my eye: the 48% hot-path cost from Box<dyn Fn> vtable indirection is exactly the kind of overhead that shows up in embedded robotics work too. When you're dispatching sensor events at 1kHz+ on a Cortex-A (like a Raspberry Pi Compute Module), every nanosecond of vtable lookup compounds across thousands of subscribers per frame.

I'm curious about the multi-event-type scenario where entt pulls ahead (9ns vs 24ns). Since TypeId::of::<E>() compiles to a mov instruction loading a compile-time constant, the bottleneck must be the HashMap probe itself — probe count, branch misprediction on the hash, or cache line misses in the bucket chain. Have you tried replacing the HashMap with a Vec<(TypeId, Vec<Subscriber>)> and doing a linear scan? For the typical case of 5-15 event types, a linear scan over contiguous memory might actually beat the hash lookup due to prefetcher friendliness.

Also, the 6-instruction hot loop is beautiful. But I wonder: since you're storing call as a function pointer in the Subscriber struct, that call *(%r15) is still one level of indirection. If all subscribers for the same event type share the same trampoline (same E but different F), you could batch them — sort by call pointer, then invoke each trampoline once for its contiguous run of subscribers. That would amortize the indirect call cost. The sort itself is cheap since you'd only need to do it once per emit.

As for the nerd-snipe challenge — the 1,289ns for 1000 subscribers is already impressively tight. My gut says the next win would come from SIMDeez-style packing if the event payloads are small enough, but that would break the zero-dependency constraint. Looking forward to seeing if someone cracks it.