DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Reducing AI Latency Through Smarter Model Routing and Token Optimization

Reducing AI Latency Through Smarter Model Routing and Token Optimization

Comments
3 min read
Agentic Tools, Rust LangFlow, and AI Pharma Breakthroughs

Agentic Tools, Rust LangFlow, and AI Pharma Breakthroughs

Comments
2 min read
Llama-Server Router Mode - Dynamic Model Switching Without Restarts

Llama-Server Router Mode - Dynamic Model Switching Without Restarts

Comments
9 min read
I Built a GPU Dataset for LLM Inference — Here’s What I Learned

I Built a GPU Dataset for LLM Inference — Here’s What I Learned

1
Comments
2 min read
Your LLM Bill Is Too High. Here's How to Fix It (Part 1)

Your LLM Bill Is Too High. Here's How to Fix It (Part 1)

Comments
3 min read
I built an LLM eval rig in a weekend. Most of it was wrong.

I built an LLM eval rig in a weekend. Most of it was wrong.

1
Comments
4 min read
How to Detect Prompt Injection in Your LLM Agent — Python, 5 Minutes

How to Detect Prompt Injection in Your LLM Agent — Python, 5 Minutes

Comments
5 min read
Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

Comments
13 min read
Harness Engineering with Nothing but Markdown

Harness Engineering with Nothing but Markdown

Comments
10 min read
GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

Comments
7 min read
Skills for eval-driven agent optimization

Skills for eval-driven agent optimization

1
Comments
1 min read
62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

Comments
16 min read
DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

1
Comments 1
3 min read
What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

Comments
3 min read
The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

Comments
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.