Tutorial // Agents2026-06-2113 min read

Build a Coding Agent with the Claude Agent SDK

The Claude Agent SDK is how Anthropic builds agents like Claude Code. Here's how to use it to build your own file-editing coding agent.

V
Varun Raj ManoharanFounder & Principal Engineer
Claude Agent SDKAgentsClaudeTutorial

Key takeaways

  • The Claude Agent SDK packages the same agent loop, tool execution, and context management that power Claude Code, so you do not write the loop yourself.
  • The single query() function returns an async generator of messages, and built-in tools like Read, Write, Edit, Bash, Grep, and Glob run automatically.
  • Define custom in-process tools with tool() and createSdkMcpServer(), then reference them in allowedTools using the mcp__<server>__<tool> namespace.
  • Start with permissionMode default and a maxTurns cap, since an agent with Bash and bypassed permissions can run any command and rack up cost.

There is a useful distinction hiding inside the word "agent." Most of what gets called an agent is really a wrapper around a single model call, plus some glue code that parses tool requests and feeds results back. That works, but you end up rebuilding the same loop every time: detect a tool call, run it, append the result, call the model again, repeat until it stops. The Claude Agent SDK exists so you do not have to write that loop. It is the same agent loop, tool execution, and context management that power Claude Code, packaged as a library you can import.

That framing matters right now because the SDK is in active, daily use. As of mid-June 2026 it draws on a separate Agent SDK credit pool, distinct from your regular Messages API spend, which is worth knowing before you point it at a large repository overnight (more on that in the gotchas). The short version: this is not a toy. It is the production substrate Anthropic uses to ship its own coding agent, and you can build on it directly.

In this tutorial I want to build something small but real: a coding agent that reads a task in plain English, looks at files, and edits them. No bespoke tool loop. We will lean on the built-in tools (Read, Write, Edit, Bash, Grep, Glob) and then add one custom tool of our own so you can see how that extension point works. By the end you will have a script you can run against an actual project directory.

What you'll need

  • Node.js 18 or newer. The TypeScript SDK ships a native Claude Code binary for your platform as an optional dependency, so you do not need to install Claude Code separately.
  • An Anthropic API key from the Console, exported as ANTHROPIC_API_KEY.
  • A project directory you are comfortable letting the agent edit. I would suggest a scratch repo or a fresh git branch the first few times, for reasons that will become obvious.
  • About ten minutes.

I am using TypeScript here because the SDK bundles the runtime for you, which makes the setup shorter. The Python package (claude-agent-sdk) mirrors the same API if you prefer that side of the fence.

Set up the SDK

Install the package:

Shell
npm install @anthropic-ai/claude-agent-sdk zod

We are pulling in zod as well, because we will use it later to describe the input schema for our custom tool.

Set your key:

Shell
export ANTHROPIC_API_KEY=your-api-key

The entry point is a single function, query(). You give it a prompt and some options, and it returns an async generator. You iterate over it, and each value is a message: an assistant turn, a tool result, a status update, or the final result. Here is the smallest thing that does real work:

TypeScript
import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "What files are in this directory?",
  options: {
    allowedTools: ["Bash", "Glob"],
  },
})) {
  if ("result" in message) console.log(message.result);
}

Run that, and Claude will call Glob or Bash on its own, read the output, and hand you a summary in the final result message. You did not write a tool loop. That is the whole pitch.

A note on allowedTools: it is an allowlist of tools the agent may use without asking for permission. Anything not on the list either gets blocked or triggers a permission prompt, depending on your permissionMode. We will tighten this up shortly, because for a coding agent the permission story is the difference between "helpful" and "deleted my src folder."

Pick the tools the agent can use

The SDK gives you a set of built-in tools out of the box. For a file-editing coding agent, the ones that matter are:

  • Read reads any file in the working directory.
  • Glob finds files by pattern, like src/**/*.ts.
  • Grep searches file contents with regex.
  • Edit makes precise edits to existing files.
  • Write creates new files.
  • Bash runs terminal commands, including git and your test runner.

You do not implement any of these. They come with the SDK. Your job is to decide which ones to hand over. A read-only reviewer agent might get Read, Glob, and Grep and nothing else. A coding agent that actually changes things needs Edit, Write, and usually Bash so it can run the tests after it edits.

Here is the same query() call shaped into a coding agent, with a system prompt that sets expectations and a model chosen for the work:

TypeScript
import { query } from "@anthropic-ai/claude-agent-sdk";

const SYSTEM_PROMPT = `You are a focused coding agent working inside a single repository.
Read the relevant files before editing. Make the smallest change that
satisfies the task. After editing, run the project's tests if a test
command exists, and report whether they pass.`;

for await (const message of query({
  prompt: "Fix the off-by-one bug in src/pagination.ts",
  options: {
    model: "claude-sonnet-4-6",
    systemPrompt: SYSTEM_PROMPT,
    allowedTools: ["Read", "Glob", "Grep", "Edit", "Write", "Bash"],
    permissionMode: "default",
    cwd: process.cwd(),
  },
})) {
  if ("result" in message) console.log(message.result);
}

A few things worth pausing on.

model takes a model alias or full name. I am using claude-sonnet-4-6 here, which is a good balance of speed and cost for routine edits. For a genuinely hard refactor that spans many files, I would reach for claude-opus-4-8 instead and accept the slower, pricier run. You can swap the string, or change it mid-run with the query object's setModel() method if you want to escalate only when a task turns out to be harder than expected.

systemPrompt accepts a plain string (as above) or a preset object if you want Claude Code's own system prompt as a base. The plain string is the right call when you want full control over the agent's behavior.

cwd sets the working directory. The built-in file tools operate relative to it, so point this at the repo you want the agent to work in.

Write the agent: read a task, edit files, watch it work

So far we have been throwing away most of the messages and only printing the final result. That is fine for a quick script, but for a coding agent you usually want to see what it is doing as it goes, both for trust and for debugging. The generator gives you every step. Let me wrap this into a small reusable function that takes a task and streams the agent's progress.

TypeScript
import { query } from "@anthropic-ai/claude-agent-sdk";

const SYSTEM_PROMPT = `You are a focused coding agent working inside a single repository.
Read the relevant files before editing. Make the smallest change that
satisfies the task. After editing, run the project's tests if a test
command exists, and report whether they pass.`;

async function runCodingAgent(task: string, projectDir: string) {
  for await (const message of query({
    prompt: task,
    options: {
      model: "claude-sonnet-4-6",
      systemPrompt: SYSTEM_PROMPT,
      allowedTools: ["Read", "Glob", "Grep", "Edit", "Write", "Bash"],
      permissionMode: "default",
      cwd: projectDir,
    },
  })) {
    // Assistant turns: Claude's reasoning and tool calls.
    if (message.type === "assistant") {
      for (const block of message.message.content) {
        if (block.type === "text") {
          console.log(block.text);
        } else if (block.type === "tool_use") {
          console.log(`[tool] ${block.name}`);
        }
      }
    }

    // The final result, success or failure.
    if (message.type === "result") {
      console.log(`\n--- ${message.subtype} ---`);
      if (message.subtype === "success") console.log(message.result);
      console.log(`Cost: $${message.total_cost_usd.toFixed(4)} over ${message.num_turns} turns`);
    }
  }
}

const [, , ...taskParts] = process.argv;
const task = taskParts.join(" ") || "Summarize what this project does.";
runCodingAgent(task, process.cwd());

Save that as agent.ts and run it against a project:

Shell
npx tsx agent.ts "Add input validation to the createUser function in src/users.ts"

What you will see is the agent narrating its own work: it reads the file, maybe greps for related call sites, makes an edit, runs the tests, and reports back. The result message at the end carries the summary plus total_cost_usd and num_turns, which I print because cost visibility per run is genuinely useful when you are iterating. (That cost figure is also where the separate Agent SDK credit comes in, which I will get to.)

Notice what the inner loop is doing. The assistant message contains a content array of blocks. A text block is Claude talking. A tool_use block is Claude deciding to call a tool. We do not act on the tool_use ourselves, the SDK runs the built-in tool and feeds the result back automatically. We are just reading over its shoulder.

Add a custom tool of your own

The built-in tools cover most of what a coding agent needs, but eventually you will want to give it something specific to your world: a function that queries your internal API, runs a proprietary linter, or looks something up in a service the agent cannot reach with Bash. The SDK lets you define in-process tools with tool(), group them into a lightweight MCP server with createSdkMcpServer(), and hand that to query().

"MCP server" sounds heavier than it is here. Because the tool runs in the same process as your script, there is no subprocess, no transport, no separate deployment. It is a function with a schema attached.

Here is a custom tool that reports the current git branch and whether the working tree is clean, so the agent can decide whether it is safe to commit:

TypeScript
import { query, tool, createSdkMcpServer } from "@anthropic-ai/claude-agent-sdk";
import { execSync } from "node:child_process";
import { z } from "zod";

const gitStatusTool = tool(
  "git_status",
  "Report the current git branch and whether the working tree is clean.",
  {}, // no inputs needed
  async () => {
    const branch = execSync("git rev-parse --abbrev-ref HEAD").toString().trim();
    const dirty = execSync("git status --porcelain").toString().trim();
    const summary = dirty
      ? `Branch ${branch}, working tree has uncommitted changes.`
      : `Branch ${branch}, working tree is clean.`;
    return { content: [{ type: "text", text: summary }] };
  },
  { annotations: { readOnlyHint: true } }
);

const gitServer = createSdkMcpServer({
  name: "git-helpers",
  version: "1.0.0",
  tools: [gitStatusTool],
});

The four positional arguments to tool() are the name, a description (Claude reads this to decide when to call it, so make it accurate), a zod schema for the inputs, and an async handler that returns content blocks. The optional fifth argument carries annotations. readOnlyHint: true tells the SDK this tool does not change anything, which is relevant for permission handling.

Wire the server into query() through the mcpServers option:

TypeScript
for await (const message of query({
  prompt: "Check whether it is safe to commit, then commit the staged changes with a sensible message.",
  options: {
    model: "claude-sonnet-4-6",
    systemPrompt: SYSTEM_PROMPT,
    allowedTools: ["Read", "Bash", "mcp__git-helpers__git_status"],
    mcpServers: {
      "git-helpers": {
        type: "sdk",
        name: "git-helpers",
        instance: gitServer,
      },
    },
    cwd: process.cwd(),
  },
})) {
  if ("result" in message) console.log(message.result);
}

Two details to copy carefully. The server is registered under a key (git-helpers) with type: "sdk" and the instance you created. And the tool's name in allowedTools is namespaced: mcp__<server-name>__<tool-name>, which is why it reads mcp__git-helpers__git_status. If you leave the custom tool off allowedTools, the agent can still try to call it, but it will hit a permission gate instead of running automatically.

Now the agent has a clean, typed way to ask "what branch am I on?" instead of parsing raw git output, and you have a pattern you can repeat for any internal capability.

Gotchas

A few things that will bite you if nobody warns you first.

Permissions are the whole game. permissionMode: "default" will pause and ask before destructive actions, which is what you want while you are still learning the agent's behavior. There are looser modes (acceptEdits auto-approves file edits, bypassPermissions skips the gate entirely), and they are tempting because they make runs smoother. Resist that on anything pointed at real code until you trust the setup. An agent with Bash and bypassed permissions can run any command. Start strict, loosen deliberately. And keep that scratch branch I mentioned, so a bad run is a git checkout . away from undone.

Runaway loops cost real money. An agent can get stuck: it edits a file, the tests fail, it edits again, they fail differently, and it keeps going. Set maxTurns to cap the number of agentic turns (something like 15 to 25 is reasonable for a focused task), so a confused agent stops instead of grinding. The result message tells you num_turns and total_cost_usd per run, so watch those numbers across a few runs to calibrate.

The Agent SDK credit is separate. As of mid-June 2026, Agent SDK usage draws on a separate Agent SDK credit pool rather than your general Messages API balance. This is easy to overlook until a run fails or stalls and you are staring at your regular API dashboard wondering why everything looks fine. If something stops working unexpectedly, check the Agent SDK credit specifically, not just your overall account balance. Because these runs are agentic (many turns, lots of file reads), spend adds up faster than a single model call, so the per-run cost printout is not just nice to have, it is how you keep the bill honest.

Cost scales with how much it reads. A coding agent on a large repo will read a lot of files to orient itself. That is the context management working as intended, but it means a vague task ("clean up the codebase") costs far more than a specific one ("rename getUserData to fetchUser across src/"). Scope your prompts. Specific tasks are cheaper and produce better results, partly because the agent does not have to guess what you meant.

Wrapping up

We went from a one-line query() call to a coding agent that reads a task, edits files with the built-in tools, runs the tests, and exposes one custom tool of our own through an in-process MCP server. The thing I keep coming back to is how little of this was loop plumbing. The SDK owns the agent loop, so the code you write is about intent (which tools, which model, which permission posture) rather than mechanics.

From here, the natural next steps are subagents (delegating a focused subtask to a specialized agent), hooks (running your own code at lifecycle points like PreToolUse to log or block actions), and sessions (carrying context across multiple turns so a follow-up like "now find all the callers" knows what "it" refers to). Each of those is another option on the same query() call you already understand.

If you build something with it, I would genuinely like to hear how it went, especially the parts where the agent surprised you. That is usually where the interesting design decisions hide.