
Build a Multi-Agent Workflow with LangGraph
A Python tutorial: model a researcher, writer, and reviewer as a LangGraph state machine, with shared state, conditional edges, and a review loop.
Key takeaways
- Reach for a LangGraph state machine only when control flow has branches or loops; a linear pipeline is clearer as a plain script.
- A shared TypedDict state lets each node return only the fields it changed, and LangGraph merges those updates automatically.
- Guard every review loop with a monotonic revision counter checked in the routing function so a picky reviewer cannot burn tokens forever.
- Run cheaper models on the researcher and writer nodes and reserve Opus for the reviewer to cut the cost of multiple LLM calls per run.
Most "multi-agent" tutorials are really prompt chains in a trench coat. You call one LLM, feed its output to the next, feed that to a third, and call it a pipeline. That works fine right up until the moment the work needs to flow backward.
Say you have a reviewer that reads a draft and decides it isn't good enough. In a straight chain, you're stuck. There's no clean way to say "send this back to the writer and try again." You end up bolting a while loop around the whole thing, threading state through by hand, and writing a small pile of if statements to decide what runs next. It gets messy fast, and the control flow lives nowhere in particular, it's smeared across your glue code.
A graph fixes this. Once your agents need branches ("is this good enough, or does it go back?") and loops ("revise until it passes, but not forever"), the natural shape is a state machine: nodes that do work, edges that decide what happens next, and one shared blob of state everyone reads and writes. That's exactly what LangGraph gives you.
In this tutorial we'll build a small team of three agents:
- a researcher that gathers notes on a topic,
- a writer that turns those notes into a draft,
- a reviewer that judges the draft and either approves it or kicks it back to the writer.
The interesting part is that last arrow, the reviewer can send work back. We'll wire that as a conditional edge with a guard so it can't loop forever. By the end you'll have a working graph you can run with app.invoke(...), and a mental model for when this structure is worth the trouble (and when it absolutely isn't).
What you'll need
- Python 3.10 or newer.
- An Anthropic API key, set as the
ANTHROPIC_API_KEYenvironment variable. - Two packages:
pip install langgraph anthropic
That's the whole dependency list. We're using the Anthropic SDK directly for the LLM calls rather than a framework wrapper, because I want you to see exactly what each agent sends and receives. There's no hidden magic in the model calls, just client.messages.create(...). LangGraph handles the orchestration; Anthropic handles the thinking.
A quick note on the model: every call below uses claude-opus-4-8. For a researcher/writer/reviewer loop, Opus is honestly more than you need on the writer node, but it keeps the example uniform. I'll point out where you'd realistically drop to a cheaper model later.
Step 1: define the shared state
In LangGraph, every node reads from and writes to one shared state object. Think of it as the team's whiteboard. The researcher writes notes on it, the writer reads the notes and writes a draft, the reviewer reads the draft and writes a verdict.
You describe that whiteboard as a TypedDict. LangGraph uses it to know what fields exist and how to merge updates between nodes.
from typing import TypedDict, Literal
class WorkflowState(TypedDict):
topic: str # the thing we're writing about
notes: str # filled in by the researcher
draft: str # filled in (and refilled) by the writer
feedback: str # the reviewer's last critique
verdict: Literal["approve", "revise", ""] # the reviewer's decision
revisions: int # how many times we've sent it back
max_revisions: int # the guard rail
Two fields here are doing quiet but important work. revisions counts how many times the reviewer has bounced the draft, and max_revisions is the ceiling. Without these two, a stubborn reviewer and a writer that can't satisfy it will ping-pong forever, burning tokens until you notice the bill. We'll check them in the routing function later.
When a node returns a dict, LangGraph merges those keys into the state. You only return the fields you changed, you don't have to pass the whole state back around. That's the part the straight-chain approach makes you do by hand, and it's the first thing that stops being annoying once a graph owns it.
Step 2: a shared Anthropic client and a small helper
Each agent is a function that takes the current state and returns an update. All three call an LLM, so let's set up the client once and write a thin helper so the node functions stay readable.
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def call_claude(system: str, user: str) -> str:
"""Single-turn call to Claude. Returns the text of the response."""
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
thinking={"type": "adaptive"},
system=system,
messages=[{"role": "user", "content": user}],
)
# The response content is a list of blocks. With adaptive thinking on,
# the first block(s) may be `thinking`; grab the text block(s).
return "".join(
block.text for block in message.content if block.type == "text"
)
A couple of things worth flagging in that call.
thinking={"type": "adaptive"} lets Claude decide how much to reason before answering. The reviewer node especially benefits, judging whether a draft is good enough is exactly the kind of "think a bit first" task adaptive thinking is built for. If you were calling an older model you'd reach for a fixed budget_tokens here, but on current Opus that parameter is gone and adaptive is the way.
The response isn't a plain string. message.content is a list of content blocks, and with thinking enabled some of them are thinking blocks rather than text. That's why the helper filters for block.type == "text" instead of grabbing message.content[0].text. If you skip that filter, you'll occasionally get an empty string or the wrong block, and it'll be confusing to debug. Pull the text out properly once, here, and the node functions never have to think about it.
Step 3: write the node functions
Now the agents themselves. Each one is just a function: state in, partial state out. This is where the "agent" framing is honestly a bit grand, these are functions with good prompts, but that's the point. The intelligence is in the model and the orchestration is in the graph, and neither needs a class hierarchy.
The researcher
def researcher(state: WorkflowState) -> dict:
notes = call_claude(
system=(
"You are a research assistant. Given a topic, produce concise, "
"factual notes: key points, relevant context, and anything a "
"writer would need. Use short bullet points. Do not write prose."
),
user=f"Topic: {state['topic']}",
)
return {"notes": notes}
It reads topic from the state and returns notes. Nothing else. LangGraph merges that single key in.
The writer
def writer(state: WorkflowState) -> dict:
# On the first pass there's no feedback. On a revision, there is —
# and we want the writer to actually use it.
if state.get("feedback"):
instruction = (
f"Here is your previous draft:\n\n{state['draft']}\n\n"
f"A reviewer gave this feedback:\n\n{state['feedback']}\n\n"
"Rewrite the draft to address the feedback. Keep what worked."
)
else:
instruction = (
f"Write a clear, engaging ~300-word article using these notes:\n\n"
f"{state['notes']}"
)
draft = call_claude(
system="You are a skilled writer. Produce a clean, finished draft.",
user=instruction,
)
return {"draft": draft}
The writer is the node the loop comes back to, so it has two modes. First time through, it has notes and no feedback, so it just writes. On a revision pass, feedback is populated, and the prompt changes to show the writer its own previous draft plus the critique. This is the whole reason the feedback lives in shared state, so the writer can see it on the way back around.
If you forget to feed the feedback back into the prompt, the loop still runs, but the writer produces a fresh draft that ignores the reviewer entirely, and you'll wonder why the same complaint keeps coming back. The state is there; you have to actually use it.
The reviewer
def reviewer(state: WorkflowState) -> dict:
response = call_claude(
system=(
"You are a critical editor. Read the draft and decide whether it "
"is publishable. Respond in exactly this format:\n"
"VERDICT: approve OR revise\n"
"FEEDBACK: <one paragraph; if approving, say what works>"
),
user=f"Topic: {state['topic']}\n\nDraft:\n\n{state['draft']}",
)
# Parse the structured response. Keep it forgiving.
verdict = "approve"
feedback = response
for line in response.splitlines():
if line.upper().startswith("VERDICT:"):
verdict = "revise" if "revise" in line.lower() else "approve"
elif line.upper().startswith("FEEDBACK:"):
feedback = line.split(":", 1)[1].strip()
return {
"verdict": verdict,
"feedback": feedback,
"revisions": state["revisions"] + 1,
}
The reviewer does two jobs: it returns a verdict ("approve" or "revise") and it bumps revisions by one. That increment is what eventually trips the guard rail, even a reviewer that never approves will run out of attempts.
I'm parsing the verdict out of the text by hand here, which is fine for a tutorial but a little fragile. If you want this to be bulletproof, define a tool with a strict schema (an enum field for the verdict, a string for the feedback) and read the structured result instead of string-matching. For now, the format prompt plus a forgiving parser gets the idea across without extra ceremony.
Step 4: build the graph and add nodes
With the nodes written, we assemble the graph. This is the part that replaces all the if/while glue you'd otherwise write.
from langgraph.graph import StateGraph, START, END
graph = StateGraph(WorkflowState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)
StateGraph(WorkflowState) creates a graph typed to our state. Each add_node registers a function under a name. The name is what the edges refer to, the functions themselves never reference each other, which is what keeps them independent and testable.
Step 5: add the straight edges
Now we connect the nodes. The simple, always-the-same transitions are plain edges:
graph.add_edge(START, "researcher") # entry point
graph.add_edge("researcher", "writer") # research always flows to writing
START is a built-in marker for "where execution begins." So: kick off at the researcher, and once research is done, always go to the writer. No decision involved, these always happen in this order, so they're unconditional edges.
The writer-to-reviewer hop is also unconditional (every draft gets reviewed), so:
graph.add_edge("writer", "reviewer")
The only decision in the whole workflow is what happens after the reviewer. That's the conditional edge, and it's next.
Step 6: the conditional edge, the review loop
Here's the part that makes this a graph rather than a chain. After the reviewer runs, we need to decide:
- approved? → we're done, go to
END. - needs revision, and we still have attempts left? → back to
writer. - needs revision, but we're out of attempts? → stop anyway, go to
END.
The decision lives in a small routing function. It reads the state and returns the name of the next node (or END).
def route_after_review(state: WorkflowState) -> Literal["writer", "__end__"]:
if state["verdict"] == "approve":
return END
if state["revisions"] >= state["max_revisions"]:
# Out of attempts. Ship what we have rather than loop forever.
return END
return "writer"
Then we register it as a conditional edge, mapping each possible return value to a destination:
graph.add_conditional_edges(
"reviewer",
route_after_review,
{
"writer": "writer", # revise → loop back
END: END, # approve or exhausted → finish
},
)
add_conditional_edges takes the source node, the routing function, and a dict that maps the function's return values to actual node names. When the reviewer finishes, LangGraph calls route_after_review, looks up the return value in the map, and sends execution there.
That revisions >= max_revisions check is the single most important line in this file. It's the difference between "a workflow that converges" and "a workflow that empties your account." The reviewer increments revisions every time it runs; the router enforces the ceiling. Belt and braces. I've watched a too-picky reviewer and an underpowered writer disagree forever in testing, the guard is what turns that from an incident into a slightly-worse-than-ideal final draft.
Step 7: compile and run
A graph is a description. To actually execute it, you compile it into a runnable app:
app = graph.compile()
Then invoke it with an initial state. Every field in the TypedDict should be present, the empty strings and zero are the starting values the nodes will fill in:
initial_state = {
"topic": "Why small teams ship faster than large ones",
"notes": "",
"draft": "",
"feedback": "",
"verdict": "",
"revisions": 0,
"max_revisions": 2,
}
result = app.invoke(initial_state)
print(result["draft"])
print(f"\n--- Finished after {result['revisions']} review pass(es) ---")
print(f"Final verdict: {result['verdict']}")
app.invoke runs the whole graph to completion and hands back the final state. The flow is: researcher → writer → reviewer → (maybe back to writer → reviewer again) → END. The draft field holds whatever the writer produced on its last pass, and revisions tells you how many rounds it took.
If you want to watch it run rather than just see the final state, swap invoke for stream:
for step in app.stream(initial_state):
print(step) # one dict per node as it completes
That prints each node's output as it happens, which is the fastest way to see the loop actually loop.
Gotchas
A few things that bite people, in rough order of how often I've seen them.
Infinite review loops. This is the big one, and it's why max_revisions exists. A conditional edge that can route back to an earlier node is a cycle, and cycles without a counter are loops without an exit. Always have a guard that's based on something that monotonically changes, here, revisions only ever goes up, so the ceiling is guaranteed to be hit. LangGraph also has a global recursion_limit you can pass to invoke as a last-resort safety net, but don't rely on it for control flow; it raises an exception rather than ending cleanly, which is a worse experience than just shipping the best draft you have.
State bloat. It's tempting to stuff everything into the shared state, every intermediate draft, every reviewer comment, full conversation histories. Resist it. The state travels with the workflow, and on a long-running or deeply-looped graph it grows. Keep only what downstream nodes actually read. The writer needs the last feedback, not all of it. If you find yourself appending to a list in state and never reading the old entries, that's bloat, drop it.
Cost of many LLM calls. This is the honest tradeoff of multi-agent setups: three agents and a review loop means a single run can be four, six, eight model calls. At max_revisions=2 you could make up to six Opus calls for one short article. That adds up. Two levers: first, don't run every node on your most expensive model, the writer and researcher are fine on a cheaper, faster model, and you can reserve Opus for the reviewer's judgment call (just create a second client call with a different model= string). Second, be honest about whether you need the loop at all, see the next point.
Debugging with tracing. When a graph misbehaves, the question is always "which node did what, and what did the state look like going in?" Print-debugging across nodes is painful because the state is the only thing connecting them. Use app.stream(...) to see each node's output in sequence, it's the quickest local diagnostic. For anything beyond toy size, wire up LangSmith (set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY) and you get a visual trace of every node, every state transition, and every LLM call with its full prompt and response. The first time you debug a loop that went one round too many, the trace pays for itself.
When this is overkill. I'll say it plainly: most tasks do not need a multi-agent graph. If your work is linear, research, then write, then done, with no path back, a graph is ceremony you don't need; a function that calls Claude three times in a row is clearer and cheaper. The graph earns its keep specifically when you have branches or loops: a reviewer that can reject, a router that picks between specialists, a retry that depends on what failed. If you can't point to an edge that bends backward or forks, you probably want a plain script. Reach for LangGraph when the control flow itself is the hard part, not before.
Wrapping up
We built a three-agent workflow as a state machine: a shared TypedDict everyone reads and writes, three node functions that each do one job, plain edges for the fixed steps, and a conditional edge that lets the reviewer send work back to the writer, with a revision counter so it can't loop forever.
The shift in thinking is the takeaway. Once you stop picturing agents as a pipeline and start picturing them as nodes with edges, the hard cases, the rejections, the retries, the "it depends" routing, become edges you draw instead of glue code you debug. Start with this researcher/writer/reviewer skeleton, add a node or an edge when the work actually needs one, and keep the guard rails in. The graph will stay readable a lot longer than the while loop would have.