Part 1 of 9 — what we are actually building, and why the framing matters more than the technology.
Most people meet a coding agent as a conversational tool: you ask it a question, it answers; you ask it to write code, it writes code. Useful, but it keeps the agent on the far side of a glass wall. Nothing it says touches the real world unless you copy, paste, and click.
This guide is about removing the glass wall in a controlled way. The reframing is simple:
The agent is not a chatbot you consult. It is an operator you delegate to — a junior colleague who sits at your workstation, signed in as you, and does real work on real systems by running the same kinds of commands you would.
That word — operator — is doing a lot of work. An operator in the industrial sense is someone trusted to run a machine: they have a console, a set of levers, a manual, and standing orders about what they may and may not do without checking with a supervisor. We are going to build exactly that, where the levers are command-line tools and the manual is a file the agent reads before it starts.
Every operator setup in this guide is made of four things. The rest of the series is just these four, elaborated:
| Ingredient | What it is | Who owns it |
|---|---|---|
| External system(s) | The thing with the work in it — a marketplace, a CRM, a ticketing system, an ERP. It exposes an API. | A third party (or your own org). |
| Tool belt | A set of small, single-purpose CLI scripts wrapping that API. Each is documented and runs as you. | You / your team. |
| The agent | Claude Code, Codex, Pi, or similar — the reasoning layer that decides which tools to run, in what order, and reads their output. | You (running on your workstation). |
soul.md + skills | Persistent instructions that give the agent its role, judgment, guardrails, and reusable procedures. | You / your team, shared and versioned. |
Notice where the intelligence lives. The tools are dumb and deterministic — they do one thing and report what happened. The agent is smart but forgetful and non-deterministic. The soul.md is the durable judgment that survives between sessions and gets better over time. Keeping these three concerns separate is the single most important design decision in the whole pattern, and we return to it constantly.
The four ingredients: you delegate to the agent; the agent reads its judgment from soul.md and acts only through the tool belt, which talks to the external system as you.
Imagine you run a small eBay store. Every morning there is a predictable pile of work: new orders to acknowledge, buyer questions to answer, a few "where is my item?" messages, the occasional return request, and listings whose prices should track the competition. None of it is hard. All of it is tedious, and all of it is done through eBay's seller tools — which have an API.
With the operator pattern, your morning looks like this instead:
$ claude
> Good morning. Run the morning routine for the store.
[agent reads soul.md and the "morning-routine" skill, then:]
• ran ebay-list-orders --status awaiting-shipment
• ran ebay-list-messages --unread
• drafted 3 buyer replies, showed them to me, sent after I approved
• flagged 1 return request as "needs your decision" (over the
auto-approve threshold in soul.md)
• ran ebay-reprice --strategy match-lowest --dry-run
proposed 4 price changes; I approved 3, skipped 1
Done. One item needs you: return request #4471 (buyer claims
"not as described", $180 — above the $100 auto-approve line).
The agent did not "scrape eBay" or do anything mysterious. It ran a handful of tools you built, each of which called eBay's API with your credentials, and it made small judgment calls within limits you set in writing. You stayed in the loop exactly where it mattered and nowhere else.
Plausible: yes, unequivocally. Every piece already exists and is in daily use. Agents call command-line tools well; APIs are everywhere; OAuth lets software act on a user's behalf safely. Nothing here requires a research breakthrough.
New: the assembly is what's worth writing down. The industry has spent two years teaching agents to write code. Far less attention has gone to the more mundane and more valuable case: using that same competence to operate the systems where ordinary work actually happens. The contribution of this pattern is not a technology; it is a discipline — small tools, explicit identity, written judgment, shared skills — that makes delegation to an agent safe enough to be useful.
In Part 2 we make the first real engineering decision: why the agent should talk to the system through small CLI tools rather than calling the API directly.