DEV Community

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Parametric Hubris: Empirical Evidence That Tool Availability Does Not Equal Tool Usage in Frontier Language Models

Parametric Hubris: Empirical Evidence That Tool Availability Does Not Equal Tool Usage in Frontier Language Models

Comments
53 min read
AWS Speed Boosts, Agentic Limits, and Clinical AI Advances

AWS Speed Boosts, Agentic Limits, and Clinical AI Advances

Comments
2 min read
5 Hidden Failure Modes When Routing Between 10+ LLM Providers in 2026

5 Hidden Failure Modes When Routing Between 10+ LLM Providers in 2026

Comments 1
6 min read
LightESB AiAgentDemoSrv v1.0.0: Let LLM Operate Order Data with Camel Tools

LightESB AiAgentDemoSrv v1.0.0: Let LLM Operate Order Data with Camel Tools

Comments
4 min read
Your AI Stack Is Too Big

Your AI Stack Is Too Big

Comments
1 min read
Why Audit Is the Missing Layer in Every Healthcare RAG System

Why Audit Is the Missing Layer in Every Healthcare RAG System

1
Comments
6 min read
Six Reliability Primitives for LLM Agents

Six Reliability Primitives for LLM Agents

1
Comments 2
3 min read
There Are Too Many RAG Optimization Techniques, So I Organized Them — and the Big Picture Finally Made Sense

There Are Too Many RAG Optimization Techniques, So I Organized Them — and the Big Picture Finally Made Sense

1
Comments
8 min read
I Stopped Treating AI Spend Like a Monthly Bill

I Stopped Treating AI Spend Like a Monthly Bill

Comments
1 min read
Building a Voice-Controlled Local AI Agent with Whisper, LLaMA, and LangGraph

Building a Voice-Controlled Local AI Agent with Whisper, LLaMA, and LangGraph

Comments
4 min read
Building a Voice-Controlled Local AI Agent using Speech-to-Text and LLMs

Building a Voice-Controlled Local AI Agent using Speech-to-Text and LLMs

Comments
2 min read
The gap between detecting hallucinations and handling them

The gap between detecting hallucinations and handling them

2
Comments
2 min read
From 66% to 96%: How I Fixed a Drive-Thru Voice Agent Before It Took a Single Real Call

From 66% to 96%: How I Fixed a Drive-Thru Voice Agent Before It Took a Single Real Call

1
Comments
4 min read
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks

Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks

Comments
3 min read
Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task

Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task

1
Comments
14 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.