Tutorial // DevTools2026-06-1413 min read

Build an AI Code Review Bot for Your Pull Requests

A TypeScript tutorial: a GitHub Action that sends your PR diff to Claude and posts a focused, useful review comment, without the noise.

Varun Raj ManoharanFounder & Principal Engineer

ClaudeGitHub ActionsCode ReviewTypeScriptTutorial

Key takeaways

A useful PR bot stays quiet on clean diffs and only flags real bugs, security issues, and correctness risks, leaving formatting to the linter.
Scope the GitHub Action to contents:read and pull-requests:write, and use a concurrency group so rapid pushes cancel stale review runs.
A hidden HTML marker lets the bot update one comment in place instead of piling up duplicate reviews on every push.
Filter lock files, generated code, and vendor directories out of the diff before sending it to Claude to avoid token waste.

Most AI code reviewers I've tried are noise machines. They leave fourteen comments on a three-line change, half of them about formatting that the linter already caught, and the other half some variation of "consider adding error handling here" on a function that cannot fail. After a week the team learns to scroll past the bot, which means it's now worse than nothing: it's a thing everyone ignores that still costs API credits.

So before writing a single line, it's worth being honest about what a good reviewer actually does. A good reviewer catches the bug you'd be embarrassed to ship: the off-by-one, the unawaited promise, the secret that snuck into a config file, the early return that skips the cleanup. It questions a decision when the decision looks wrong. And crucially, it stays quiet when there's nothing useful to say. Prettier handles spacing. ESLint handles unused variables. Your bot's job is the stuff that needs a brain.

That's the whole design philosophy here, and it shapes every choice below. We're going to build a GitHub Action that fires on every pull request, pulls the diff, hands it to Claude with a prompt tuned for signal over volume, and posts a single review comment. One comment, focused, skippable in ten seconds if the PR is clean. TypeScript throughout, because the GitHub and Anthropic SDKs both have good TypeScript support and the types catch a lot of mistakes before they ship.

What you'll need

A GitHub repository where you can add workflows and secrets.
An Anthropic API key. Store it as a repository secret named ANTHROPIC_API_KEY (Settings → Secrets and variables → Actions).
Node 20 or newer. The Action runner provides this; you only need it locally if you want to test the script outside CI.
Two npm packages: @anthropic-ai/sdk and @octokit/rest. We'll also lean on @actions/github and @actions/core, which are preinstalled on GitHub-hosted runners but worth installing locally for the types.

Shell

npm install @anthropic-ai/sdk @octokit/rest @actions/github @actions/core
npm install -D typescript tsx @types/node

tsx lets us run the TypeScript directly in the workflow without a separate build step, which keeps the whole thing to one file.

The workflow

Start with the GitHub Actions side, because it defines the contract: when does the bot run, and what does it have access to? Create .github/workflows/ai-review.yml:

YAML

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

# Least privilege. The job can read the repo and write PR comments —
# nothing else. Don't widen this.
permissions:
  contents: read
  pull-requests: write

# One review per PR at a time. A new push cancels the in-flight run
# instead of stacking comments.
concurrency:
  group: ai-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - name: Check out the code
        uses: actions/checkout@v4

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install dependencies
        run: npm ci

      - name: Run the reviewer
        run: npx tsx scripts/review.ts
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}

A few things in here matter more than they look.

The types filter on pull_request means the bot runs on open, on every subsequent push (synchronize), and when someone reopens a closed PR. It does not run on label changes, comment edits, or the dozen other PR events you don't care about. Each run costs an API call, so be deliberate about triggers.

The permissions block is the one I'd push back on if I saw it set to write-all in a review. The default GITHUB_TOKEN is scoped to whatever you declare here, and a code-review bot needs exactly two things: read the repo, write a comment. Granting more is how a compromised dependency turns into a compromised repo.

concurrency with cancel-in-progress is the difference between "one review per PR" and "five stale reviews from five rapid pushes." When you push three commits in a minute, the first two runs get cancelled and only the last one, reviewing the latest state, survives.

Note that GITHUB_TOKEN is passed through secrets, not pulled from the ambient environment. GitHub injects it automatically, but being explicit keeps the script's dependencies obvious.

Fetching the diff

Now the script. Create scripts/review.ts. The first job is getting the diff, and the GitHub API has a nice trick for this: ask for the PR with a diff media type and it returns the raw unified diff as a string instead of the usual JSON.

TypeScript

import { Octokit } from "@octokit/rest";

const [owner, repo] = (process.env.REPO ?? "").split("/");
const pull_number = Number(process.env.PR_NUMBER);

const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

async function getDiff(): Promise<string> {
  const response = await octokit.rest.pulls.get({
    owner,
    repo,
    pull_number,
    mediaType: { format: "diff" },
  });
  // With the diff media type, `data` is the raw diff string,
  // not the usual PR object. The types don't know that, so cast.
  return response.data as unknown as string;
}

That as unknown as string is ugly but honest. The Octokit types describe the JSON response shape; the mediaType override changes what comes back at runtime, and TypeScript can't follow that. A comment explaining why is better than pretending the cast isn't there.

A unified diff is exactly what we want to send to Claude. It already contains only the changed lines plus a few lines of surrounding context, with +/- markers and @@ hunk headers telling the model which file and which lines each change touches. We don't have to reconstruct any of that.

Asking Claude for a review

Here's the part people get wrong. The instinct is to write a short prompt, "review this code", and let the model figure out the rest. What you get back is the kind of generic, hedge-everything review that makes the bot useless. The prompt is where you encode the taste you want.

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const SYSTEM_PROMPT = `You are reviewing a pull request diff. You are a careful, senior engineer, not a linter.

Report only things that matter:
- Bugs: logic errors, off-by-one, unhandled errors, race conditions, null/undefined access, resource leaks.
- Security: injection, secrets committed to the repo, missing authorization, unsafe deserialization.
- Correctness risks in the change itself — not pre-existing issues in unchanged code.

Do NOT comment on:
- Formatting, whitespace, import order, or anything a linter or formatter handles.
- Style preferences or naming, unless a name is actively misleading.
- Theoretical improvements that don't fix a real problem in this diff.

If the diff is clean, say so in one sentence. Do not invent issues to seem useful.

Write your review as GitHub-flavored Markdown. For each finding, give the file and a short explanation of the risk and the fix. Be concise. A reviewer reads this in under a minute.`;

async function reviewDiff(diff: string): Promise<string> {
  const message = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    system: SYSTEM_PROMPT,
    messages: [
      {
        role: "user",
        content: `Review this pull request diff:\n\n\`\`\`diff\n${diff}\n\`\`\``,
      },
    ],
  });

  const text = message.content
    .filter((block) => block.type === "text")
    .map((block) => block.text)
    .join("\n");

  return text.trim();
}

new Anthropic() with no arguments reads the key from ANTHROPIC_API_KEY, which is exactly the env var the workflow sets. No need to pass it explicitly.

The model choice is a real tradeoff. claude-sonnet-4-6 is fast and cheap and genuinely good at this, it's my default for review. For a critical repo where a missed bug is expensive, or for PRs that touch security-sensitive code, swap in claude-opus-4-8: it reasons harder and catches subtler things, at higher cost and latency. You could even branch on the diff size or the files touched and route big or sensitive changes to Opus. Start with Sonnet and upgrade if you find the reviews shallow.

The system prompt does the heavy lifting. Notice how much of it is a list of things not to comment on. That negative space is what keeps the bot quiet. The instruction "If the diff is clean, say so in one sentence" matters too, without it, models feel obligated to find something, and you get manufactured concerns. Telling it explicitly that an empty review is a valid review changes the behavior.

content comes back as an array of blocks. We filter for text blocks and join them, there won't be tool-use or thinking blocks in this setup, but filtering by type is the correct habit and costs nothing.

Posting the comment

The last step is putting the review on the PR. A pull request is an issue under the hood as far as comments go, so the call lives on the issues namespace:

TypeScript

const MARKER = "<!-- ai-code-review -->";

async function postReview(body: string): Promise<void> {
  const fullBody = `${MARKER}\n## AI code review\n\n${body}`;

  // Find a comment we left on a previous run and update it,
  // so repeated pushes don't pile up duplicate reviews.
  const existing = await octokit.rest.issues.listComments({
    owner,
    repo,
    issue_number: pull_number,
  });

  const previous = existing.data.find((c) => c.body?.includes(MARKER));

  if (previous) {
    await octokit.rest.issues.updateComment({
      owner,
      repo,
      comment_id: previous.id,
      body: fullBody,
    });
  } else {
    await octokit.rest.issues.createComment({
      owner,
      repo,
      issue_number: pull_number,
      body: fullBody,
    });
  }
}

The hidden HTML marker is the trick that keeps a PR's conversation tab from filling with bot comments. On the first run we create a comment; on every run after that we find the previous one by its marker and edit it in place. The reviewer's verdict always reflects the current state of the PR, and there's exactly one comment to read. This single change does more for the "not annoying" goal than anything in the prompt.

Wire the three functions together at the bottom of the file:

TypeScript

async function main() {
  const diff = await getDiff();

  if (!diff.trim()) {
    console.log("Empty diff, nothing to review.");
    return;
  }

  const review = await reviewDiff(diff);
  await postReview(review);
  console.log("Review posted.");
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Open a pull request and the Action runs, leaves one comment, and updates it as you push. That's the whole thing.

Gotchas

The happy path above works on a normal-sized PR. Here's what breaks at the edges, and what I'd do about each.

Huge diffs and token limits

A diff is just text, and Claude has a generous context window, but a PR that regenerates a lock file or vendors a dependency can produce a diff that's hundreds of thousands of lines. You don't want to pay to review a package-lock.json churn, and you don't want to blow your token budget on it either.

The cheap, effective fix is to filter the diff before it ever reaches the model. Drop lock files, generated code, and anything in a vendor directory. You can do this against the file list:

TypeScript

const IGNORE = [/package-lock\.json$/, /\.min\.(js|css)$/, /^vendor\//, /^dist\//];

const { data: files } = await octokit.rest.pulls.listFiles({
  owner,
  repo,
  pull_number,
});

const reviewable = files.filter((f) => !IGNORE.some((re) => re.test(f.filename)));

For the rare PR that's still enormous after filtering, set a character budget and either truncate with a clear note in the comment ("diff too large, reviewed the first N files") or skip review and say so. Silently truncating is the one thing not to do, a review that quietly ignored half the changes is worse than no review, because people will trust it.

Secrets and what the token can see

Two things to keep straight. First, ANTHROPIC_API_KEY lives in repository secrets and is exposed to the job only through the env block. It never appears in logs unless you print it, so don't print it. Second, and this is the GitHub Actions gotcha that bites people: on pull_request triggers from forked repositories, GITHUB_TOKEN is read-only by design, so the bot can't post a comment on a PR from an outside contributor's fork. That's a security feature, it stops a fork from running code that abuses your token. If you need to review fork PRs, that's the pull_request_target trigger, and you should read GitHub's security guidance carefully before reaching for it, because it runs with access to secrets against untrusted code. For internal repos and same-repo branches, the default behavior is what you want.

Avoiding noisy comments

Most of the anti-noise work is in the prompt and the update-in-place logic, but two more things help. Keep max_tokens modest, 2048 is plenty for a focused review, and a tight ceiling discourages the model from rambling. And resist the urge to also post inline line-by-line comments via the review-comments API. Inline comments feel sophisticated, but they multiply the surface area for noise and they're harder to dismiss as a batch. One summary comment that a human can read and act on beats twenty inline nags. If you outgrow that, add inline comments later, deliberately, not as the default.

Only reviewing what changed

The prompt tells Claude to comment on the change, not on pre-existing code, and the diff format helps enforce that, the model mostly sees changed lines plus a little context. But it's a soft boundary. A model will sometimes notice a problem in a context line and flag it. That's not always wrong (you did, technically, touch that area), but if it's annoying your team, reinforce it in the prompt: "Only flag issues on lines the diff adds or modifies, marked with +." You can also strip unchanged context lines from the diff before sending, though I'd keep them, context helps the model understand the change, and a little drift is a fair price.

Rate limits

Two separate limits are in play. The Anthropic API has its own rate limits; the SDK retries 429 and 5xx responses automatically with backoff, so a single review almost never hits them, but a monorepo merging dozens of PRs at once could. The concurrency block in the workflow already caps you at one run per PR. The GitHub API has its own limits too, and the handful of calls per review is nowhere near them. If you do start seeing 429s from Anthropic across a busy org, the lever is fewer reviews, tighten the trigger types, or skip draft PRs, rather than faster retries.

Wrapping up

The bot is maybe a hundred lines of TypeScript and a short YAML file, and the code is the easy part. The hard part is the judgment baked into the prompt: review the change, catch real problems, and shut up otherwise. That's also the part you'll keep tuning. Watch the comments it leaves for a week. Every time it says something useless, that's a line for the "do not comment on" list. Every time it misses something a human caught, that's a hint to bump the model or sharpen an instruction.

Done right, you end up with a reviewer that the team actually reads, one that occasionally catches the thing everyone else missed at 6pm on a Friday, and stays quiet the rest of the time. That second half is what makes the first half worth anything.