Tutorial // Frontend AI2026-06-2112 min read

Build a Streaming AI Chat UI with the Vercel AI SDK

A hands-on Next.js tutorial: stream tokens from an LLM to a polished chat UI using the Vercel AI SDK, the App Router, and a single route handler.

V
Varun Raj ManoharanFounder & Principal Engineer
Vercel AI SDKNext.jsStreamingFrontendTutorial

Key takeaways

  • A streaming chat UI needs only two files: one App Router route handler calling streamText and one client component using the useChat hook.
  • The route returns result.toUIMessageStreamResponse() and the client renders message.parts by type, which keeps working when you add tool calls or attachments.
  • Vercel AI SDK version 6 no longer manages the input field, so you hold the input in your own React state and call sendMessage yourself.
  • Protect the open POST endpoint with rate limiting, and cap or summarize history because every message is resent to the model each turn.

A chat box that sits there spinning for eight seconds before dumping a wall of text feels broken, even when it isn't. The model was working the whole time. The user just couldn't tell. Streaming fixes that: tokens land on screen as they're generated, so the first words show up in a few hundred milliseconds and the rest follow at reading speed. Nothing about the model got faster. The wait just stopped feeling like a wait.

That perceived-latency win is most of why streaming chat became the default interaction pattern for LLM apps. The good news is that the plumbing to get it working is far less hairy than it used to be. In this post we'll build a streaming chat interface end to end with the Vercel AI SDK and the Next.js App Router: one route handler on the server, one client component, and the SDK doing the awkward parts in between.

By the end you'll have a working chat UI that streams responses token by token, handles user input, and renders an ongoing conversation. I'll also be honest about the rough edges, because there are a few.

What you'll need

Nothing exotic:

  • A Next.js project on the App Router (Next 14 or 15 is fine). If you don't have one, npx create-next-app@latest and accept the App Router default.
  • The Vercel AI SDK, version 6 or later. The APIs in this post changed meaningfully across major versions, so if you're following along with an older install some names won't match.
  • Access to a model provider. We'll route through the Vercel AI Gateway, which lets you name a model with a plain "provider/model" string instead of wiring up a provider package by hand.

For the model I'm using anthropic/claude-sonnet-4-6, a good balance of speed and quality for chat. Swap in whatever you have credits for; the gateway string is the only line that changes.

Install the packages:

Shell
npm install ai @ai-sdk/react

ai is the core SDK and runs on the server. @ai-sdk/react gives you the React hooks for the client. You'll also need a gateway credential in your environment, set AI_GATEWAY_API_KEY in .env.local. If you'd rather call a provider directly, install that provider's package instead and pass a model instance; everything downstream stays the same.

Step 1: the route handler

The server side is a single App Router route handler. Create app/api/chat/route.ts:

TypeScript
import { streamText, convertToModelMessages, type UIMessage } from "ai";

export const maxDuration = 30;

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: "anthropic/claude-sonnet-4-6",
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

That's the whole server. A few things worth pointing out.

The handler exports POST. The useChat hook on the client posts to this route by default, sending the running conversation as JSON, so the method has to match.

streamText kicks off the generation and returns immediately, it does not wait for the model to finish. The returned result is a handle to a stream that's already in flight. Note there's no await on the call itself; you await the pieces you actually consume.

convertToModelMessages translates between two shapes that are easy to confuse. The client speaks in UIMessage objects, which carry rendering details like message parts and IDs. The model wants plain model messages. This helper does the conversion so you don't hand-map fields and quietly drop something.

Finally, result.toUIMessageStreamResponse() wraps the stream in a Response whose body is the protocol the useChat hook knows how to read. This is the line that makes the client-side hook "just work", it emits the structured event stream the hook parses into message parts, not a raw text dump.

The maxDuration export raises the function's time limit to 30 seconds, which matters once you deploy. Default serverless timeouts are short, and a thoughtful model answering a real question can run past them. Set this to something sane for your traffic.

Step 2: the client component

Now the part the user sees. Create a client component, app/page.tsx or wherever your chat lives:

TSX
"use client";

import { useChat } from "@ai-sdk/react";
import { useState } from "react";

export default function Chat() {
  const { messages, sendMessage, status } = useChat();
  const [input, setInput] = useState("");

  return (
    <div className="mx-auto flex h-screen max-w-2xl flex-col p-4">
      <div className="flex-1 space-y-4 overflow-y-auto">
        {messages.map((message) => (
          <div
            key={message.id}
            className={message.role === "user" ? "text-right" : "text-left"}
          >
            <span className="inline-block rounded-lg bg-neutral-100 px-3 py-2">
              {message.parts.map((part, i) =>
                part.type === "text" ? <span key={i}>{part.text}</span> : null,
              )}
            </span>
          </div>
        ))}
      </div>

      <form
        onSubmit={(e) => {
          e.preventDefault();
          if (!input.trim()) return;
          sendMessage({ text: input });
          setInput("");
        }}
        className="mt-4 flex gap-2"
      >
        <input
          className="flex-1 rounded border px-3 py-2"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Say something..."
          disabled={status !== "ready"}
        />
        <button
          type="submit"
          className="rounded bg-black px-4 py-2 text-white disabled:opacity-50"
          disabled={status !== "ready"}
        >
          Send
        </button>
      </form>
    </div>
  );
}

The "use client" directive at the top is not optional. useChat holds state and runs effects, so the component has to be a Client Component. Leave the directive off and the build will tell you, though not always in the clearest terms.

useChat hands back a few things. messages is the full conversation, user and assistant turns together, updated live as tokens stream in. sendMessage appends a user message and fires the request to your route handler. status reports where the request is, "ready", "submitted", "streaming", or "error", which is what I'm using to disable the input while a response is in flight.

Notice the input itself is plain React state, not something the hook owns. Earlier versions of the SDK managed the input field for you. Version 6 doesn't, and honestly it's cleaner this way, you control the field, you decide when to clear it, and sendMessage is just a function you call. When the form submits, I send the text and reset the local state by hand.

Rendering message parts

The bit that trips people up is message.parts. A message is not a single string. It's an array of typed parts, and text is only one of the types, others show up when you use tool calls, reasoning, or attachments. That's why the render loop checks part.type === "text" before reading part.text instead of dropping {message.content} straight into the JSX.

It looks like ceremony when all you have is text. It stops looking like ceremony the first time you add a tool that returns a chart or a file, because the same loop already knows to skip the parts it can't render. Writing it this way from the start saves a rewrite later.

When the assistant is mid-stream, this component re-renders on each chunk and the text grows in place. You don't write any of that loop yourself. The hook updates messages, React re-renders, and the words appear.

Step 3: run it

Start the dev server, open the page, type a question. The first tokens should land almost immediately and the rest stream in behind them. If the whole reply arrives at once after a pause, the stream isn't actually streaming, jump to the gotchas below, because that's usually a buffering proxy or a missing toUIMessageStreamResponse().

That's the entire build. One route handler, one component. Most of what makes streaming chat feel hard is the stuff the SDK is now quietly handling: the wire protocol, incremental parsing on the client, and stitching partial chunks into coherent message state.

Gotchas

The happy path is short. Here's what bites once you go past it.

Edge vs Node runtime

You can run the route handler on the Edge runtime (export const runtime = "edge") or the default Node runtime. Edge gives you faster cold starts and streams nicely, which suits a chat endpoint. But Edge runs a restricted set of APIs, no native Node modules, no filesystem, so if your handler reaches for a database driver or anything Node-specific before calling the model, Edge will break in ways that don't always show up locally. My default is to leave it on Node until I have a concrete reason to switch, then test the streaming behavior on Edge specifically before shipping it. Don't assume a handler that streams on Node streams identically on Edge.

Aborting a stream

Users change their minds. They ask a question, realize it's wrong, and want to stop a half-finished answer. useChat exposes a stop function for exactly this, call it and the in-flight request is cancelled, the connection closed. Wire it to a stop button that shows while status === "streaming". Without it, a user who fires off a long generation has no way out but to wait or reload, and reloading mid-stream tends to leave the conversation in a confusing state.

Error handling

When the model call fails, bad key, rate limit, the provider having a bad day, useChat surfaces an error object and flips status to "error". Read it and show something. The mistake I see most is leaving the UI stuck in a fake "thinking" state because nobody checked the error, so the user stares at a spinner that will never resolve. At minimum, render the error and offer a retry via the regenerate function the hook provides. Also be careful about what reaches the client: errors can carry provider details you'd rather not expose, so on the server you may want to map them to a generic message before the stream sends them along.

Rate limits and cost

This one isn't about the SDK; it's about the open POST endpoint you just created. app/api/chat will happily forward anything to a paid model, including a script someone points at it. Put rate limiting in front of it, by IP, by session, by user, before it sees real traffic. And remember every message in messages is sent to the model on every turn, so a long conversation costs more per request than a short one. For long-lived chats, cap the history you forward or summarize older turns. The bill is proportional to the context you keep resending, and it's easy to forget that the cost grows with the conversation.

Wrapping up

The shape of this is deliberately small: a POST handler that calls streamText and returns result.toUIMessageStreamResponse(), and a client component built on useChat that maps over messages and renders their parts. That's the core, and you can extend it a long way before it stops fitting on one screen, tool calls, persistence, auth, multi-model routing all bolt onto these same two files.

The unglamorous parts, aborting cleanly, surfacing errors, keeping a public endpoint from becoming a bill, are where a demo turns into something you'd actually put in front of users. Get the stream working first, then spend your time there.