What Everyone Gets Wrong About "Agents" — Even Heavy Claude Users Can't Define One

This is where I start on Agentic AI — before grading projects, talking architecture, or shipping anything, get clear on the one word: what is an agent. 中文版:连天天用 Claude 的人,都说不清什么是 agent。
This year, the question I get asked most in DMs comes from a group you wouldn't expect.
Not beginners. The strong ones — people who write code and build products with Claude and Codex every day, some out of big tech, some running their own startups. The people who should be the least confused. And yet they keep asking me the same thing: "I've honestly still never quite figured out what actually counts as an agent."
It's not that they don't understand the tech. They've been dizzied by it. Open any group chat, any launch event, any fundraising deck — anyone who so much as touches an LLM says they're "building an agent," "architecting an agent." Do a summary, it's an agent; wrap a prompt, it's an agent; call an API, it's an agent too. The word has been stretched so hard it now points at everything, and therefore at nothing.
If even they're confused, I'll say it plainly: this isn't your knowledge failing to keep up — it's the language of the word collapsing.
I know this confusion well. Building a customer-service agent at my company, the first thing that stalled me wasn't technical, it was my boss — he'd say "we're going to build an Agent," but his "agent," the vendor's "agent," and my engineering "agent" were not the same thing. Four people in one meeting, each sure they'd been clear, everyone confused by the end. How the money got misspent, the KPIs mis-set, the acceptance review turned into a fight — the root of all of it was this one word.
Later I wrote an L0-L3 grading scale to fix it — but that's the next step, "how to grade." Before grading, you need something plainer: one judgment that separates a real agent from an LLM with a skin on it, on the spot. That's all this post is about.
Why the word "agent" collapses — four people, four meanings
The same word, four people at one table, means four things:
- Sales / vendors: anything that sells an LLM capability, from Q&A to a multimodal assistant, is an agent.
- The business boss: something "smarter than a bot" — it solves problems, runs the business on its own.
- Engineers: strictly, an LLM system that can plan autonomously, call tools, and loop.
- Media / blogs: any LLM + a business scenario writes up as "an X Agent."
Each side is sure it was clear. Until the boss says "we're doing agents this year," sales brings an "agent solution" quote, the engineer thinks "isn't this just intent classification + RAG," and at delivery the business side finds "it's no different from the old bot."
The word got captured by marketing — because "agent" sells better, raises better, headlines better than "an LLM feature." So it inflated, until it lost its edges. A word with no edges can't be used to make decisions.
One question that settles it: is the LLM deciding, or filling in a blank?
You don't need a four-level framework or an architecture background. To judge whether something is an agent, ask one thing:
In your system, is the LLM deciding "where to go next," or just generating content in a box someone already drew?
One line to remember: is it deciding for you, or typing for you?Two examples make it obvious:
- Smart meeting notes: record → transcribe → LLM summarizes → push to attendees. The flow is fixed, drawn in advance; the LLM only generates content in the "summarize" box. It types for you, it doesn't decide for you — not an agent.
- Customer service: the customer says "I want to return this," and the system has to judge intent, check the order, check logistics, check policy, then decide refund, exchange, or handoff — the LLM decides "where to go next" at several nodes. It decides for you — an agent.
One line: is it deciding for you, or typing for you? The former is an agent, the latter is "automation with an LLM in it." Both have value, but they aren't the same thing, aren't the same price, and shouldn't be judged by the same KPI.
Conflation isn't a harmless slip — it makes you misspend money, KPIs, and expectations
The cost of the wrong name is real:
- Money misspent: you pay "deciding"-type agent prices for a "fill-in-the-blank" automation — an order of magnitude apart in build effort and budget.
- KPI mis-set: you hang an "autonomous decision rate" on a Q&A bot; the vendor says it hit target, the business says it feels off, and neither can convince the other.
- Expectations misaligned: the boss waits for something that "runs the business on its own" and gets a more conversational template text.
Everyone wastes money, and no one knows how the money got wasted. Because what each of them calls "done" isn't the same thing. This isn't a tech problem, it's the lack of a shared language — and shared language starts by separating "deciding" from "filling in a blank."
Once you can tell real from fake, where to go
Telling real from fake is only step one. Once it's clear, there's a path:
- A pile of "smart-X" projects to grade (not just yes/no, but L0-L3, how much to invest, which KPI) → I Audited 28 AI Projects, Only 5 Were Real Agents — the L0-L3 grading framework
- The few real agents, to build and ship → the Agentic AI in Practice series (L2 architecture, Critic, intent classification, testing, launch gates)
- After launch, to keep it right and stop it quietly getting dumber → the After Launch series (sampling, labeling, evaluation, drift, the data flywheel)
But the first brick under all of it is this one word. In an era where everyone says they're building an agent, being able to say clearly what isn't an agent is itself an edge. Next time someone tells you "we're building an agent," just ask one thing — is it deciding for the user, or typing for the user?
If this put the "deciding vs filling in a blank" ruler in your hand, send me the keyword "WHAT IS AN AGENT" and I'll send this judgment plus the full L0-L3 grading scale together.
Subscribe for updates
Get the latest AI engineering posts delivered to your inbox.
